Methods of monitoring conditions by sequence analysis

ABSTRACT

There is a need for improved methods for determining the diagnosis and prognosis of patients with conditions, including autoimmune disease and cancer. Provided herein are methods for using DNA sequencing to identify personalized biomarkers in patients with autoimmune disease and other conditions. Identified biomarkers can be used to determine the disease state for a subject with an autoimmune disease or other condition.

CROSS REFERENCE

This application is a continuation application of co-pending U.S.utility application Ser. No. 13/459,701 filed 30 Apr. 2012, which is acontinuation application of U.S. utility application Ser. No.12/615,263, filed 9 Nov. 2009, now U.S. Pat. No. 8,236,503, which patentclaims the benefit of U.S. Patent Application Ser. No. 61/112,693, filed7 Nov. 2008; each of the foregoing applications are incorporated byreference in their entireties

BACKGROUND OF THE INVENTION

The immune system comprises the innate and the adaptive immunitysystems. The innate immune system comprises the cells and mechanismsutilizing generic methods to recognize foreign pathogens. Cells involvedin innate immunity include neutrophils, natural killer cells,macrophages, monocytes, basophils, eosinphils, mast, and dentriticcells. These cells carry out the act of phagocytosis as well as therelease of many chemicals that kill invading pathogens. In addition,these cells are involved in innate immunity defense mechanisms includingthe complement cascade and inflammation. Finally, some of these cellsparticipate in the antigen presentation process that plays a role in theadaptive immunity system.

The adaptive immunity system has evolved to attack specific features ontheir targets. The occurrence of one response to a specific targetprovides the host with “memory” of it, causing it to mount a strongerresponse if it were to appear another time. Usually any protein orpolysaccharide can serve as the target for some subset of the adaptiveimmune response cells or their products that recognize specific epitopeson the target. The adaptive immune response is divided into two types:the humoral and the cell-mediated immune response, and B-cells andT-cells play the specificity roles in these responses, respectively.

Since autoimmune disease involves the recognition of some element of theadaptive immune system to self targets, aspects of the adaptive immunesystem have been examined to aid in diagnosis and prognosis. Usingstandard immunological techniques, the humoral immune system has beeninvestigated by looking for circulating autoantibodies. Autoantibodies,like antinuclear, anti-dsDNA, and rheumatoid factor, have beenidentified for several diseases. These antibodies may not themselves bepathological, nor is the target they recognize in the body necessarilythe same as that tested for in vitro; however, measurement of theirlevels aids in the diagnosis and in some cases has some prognostic andtreatment implications.

Another methodology to study the adaptive immune system in autoimmunedisease is based on the analysis of the diversity of the adaptive immunecells. Activation of the adaptive immune cells leads to their clonalexpansion. Evidence of this clonal expansion is usually achieved byamplification from the blood RNA or DNA of part of the nucleic acidsequence coding for the antigen recognition region. For example, PCRprimers to amplify sequences that have a specific V segment of the βchain in T-cell receptor (analogous to antibody heavy chain) are used toamplify the J segments or J and D segments connected to the specific Vsegment. When a diverse cell population is present it is expected toamplify fragments with a distribution of slightly different sizeamplicons, but clonal expansion causes specific sizes to become enrichedand thus more intense as visualized as bands on a gel. In the techniquecalled spectratyping each of the V segments is amplified with the J andD segments to assess whether any of these amplicons shows a clonalexpansion.

One problem of the spectratyping approach is that many distinctsequences can have the same length and hence are indistinguishable.Therefore only dramatic clonal expansion can be discerned by thistechnique. There is need to improve methods of diagnosing and aidingprognosis of autoimmune disease and autoimmune disease states as well asother diseases for which the immune system plays a central role.

While additional specificity in profiling the immune system would be ofgreat utility in allowing its impact on human health to be betterpredicted, still greater utility would be delivered if methods weredeveloped that would allow the specific T and B cells involved indisease processes to be identified even if those particular sequenceshad never before been observed. The vast diversity of the immune systemprovides it with an immense reserve of potentially useful cells but alsopresents a challenge to the researcher trying to use this repertoire forpredictive purposes. Any single sequence targeting an antigen is one ofa vast number that could be involved with and/or correlated to thedisease process in a given individual. Methods that would identify whichof the many cells in a given individual are involved with diseaseprocesses would be of great value to human health.

SUMMARY OF THE INVENTION

In one aspect, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided comprising: obtaining asample from a subject comprising T-cells and/or B-cells, spatiallyisolating individual molecules of genomic DNA from said cells;sequencing said spatially isolated individual molecules of genomic DNA,and determining the levels of different sequences from said sample togenerate said profile of recombined DNA sequences.

In another aspect, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided comprising: obtaining asample from a subject comprising T-cells and/or B-cells, spatiallyisolating individual molecules of genomic DNA from said cells,amplifying said individual molecules of genomic DNA, sequencing saidamplified DNA, and determining the levels of different sequences fromsaid sample to generate said profile of recombined DNA sequences.

In another aspect, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided comprising: obtaining asample from a subject comprising T-cells and/or B-cells, amplifyinggenomic DNA from said cells, spatially isolating individual molecules ofsaid amplified DNA, sequencing said spatially isolated individualmolecules of amplified DNA; and determining the levels of differentsequences from said sample to generate said profile of recombined DNAsequences.

In another aspect, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided comprising: obtaining asample from a subject comprising T-cells and/or B-cells, amplifyinggenomic DNA from said cells, spatially isolating individual molecules ofsaid amplified DNA, re-amplifying said amplified DNA molecules,sequencing said re-amplified DNA molecules, and determining the levelsof different sequences from said sample to generate said profile ofrecombined DNA sequences.

In another aspect, a method for determining a profile of sequences ofrecombined DNA in T-cells and/or B-cells is provided comprising:obtaining a sample from a subject comprising T-cells and/or B-cells,reverse transcribing RNA from said cells to form cDNA, spatiallyisolating individual molecules of said cDNA, optionally re-amplifyingsaid spatially isolated individual molecules of cDNA, sequencing saidcDNA and/or re-amplified cDNA; and determining the levels of differentsequences from said sample to generate said profile of recombined DNAsequences.

In another aspect, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided comprising: obtaining asample from a subject comprising T-cells and/or B-cells; spatiallyisolating individual cells in said sample, sequencing individualmolecules of nucleic from said cells; and determining the levels ofdifferent sequences from said sample to generate said profile ofrecombined DNA sequences.

In one embodiment, said amplifying and/or re-amplifying comprises PCR,multiplex PCR, TMA, NASBA, or LAMP. In another embodiment, saidspatially isolating comprises subcloning said DNA or cDNA in vectorsused to transform bacteria, separating said DNA or cDNA in twodimensions on a solid support, separating said DNA or cDNA in threedimensions in a solution with micelles, or separating molecules usingmicro-reaction chambers. In another embodiment, said amplifying and/orre-amplifying is by growth of bacteria harboring subcloned DNA or cDNA,amplification of DNA or cDNA on a slide, or amplification of DNA or cDNAon a bead.

In another embodiment, said sequencing comprises dideoxy sequencing. Inanother embodiment, said sequencing comprises sequencing by synthesisusing reversibly terminated labeled nucleotides. In another embodiment,said sequencing comprises detection of pyrophosphate release onnucleotide incorporation. In another embodiment, said sequencingcomprises allele specific hybridization to a library of labeledoligonucleotide probes. In another embodiment, said sequencing comprisessequencing by synthesis using allele specific hybridization to a libraryof labeled oligonucleotide probes followed by ligation of said probes.In another embodiment, said sequencing comprises real time monitoring ofthe incorporation of labeled nucleotides during a polymerization step.

In another embodiment, said recombined DNA sequences comprise T-cellreceptor genes and/or immunoglobulin genes. In another embodiment, saidsequencing comprises sequencing a subset of the full clonal sequences ofimmunoglobulin and/or T-cell receptor genes.

In another embodiment, said subset of the full clonal sequence comprisesthe V-D junction, D-J junction of an immunoglobulin or T-cell receptorgene, the full variable region of an immunoglobulin or T-cell receptorgene, the antigen recognition region, or the complementarity determiningregion 3 (CDR3). In another embodiment, said T-cell receptor genescomprise T-cell receptor β genes. In another embodiment, saidimmunoglobulin genes comprise immunoglobulin heavy genes. In anotherembodiment, said amplifying or re-amplifying comprises a plurality ofprimers complementary to V segments and one primer complementary to a Csegment. In another embodiment, said amplifying or re-amplifyingcomprises a plurality of primers complementary to V segments and aplurality of primers complementary to C segments.

In another embodiment, said plurality of primers complementary to Vsegments comprises at least three different primers for each V segmentand the plurality of primers complementary to C segments comprises atleast 1, at least 2, at least 3, at least 4, at least 5, or at least 6primers.

In another embodiment, said T- or B-cells are subsets of the total T andB cells. In another embodiment, said subset of T-cells are CD4+, CD8+cells, or CD27^(high) cells. In another embodiment, said samplecomprises at least 100,000, at least 500,000, at least 750,000, of atleast 1,000,000 T-cells.

In another embodiment, said sequencing comprises at least 1000 reads perrun, at least 10,000 reads per run, at least 100,000 reads per run, orat least 1,000,000 reads per run. In another embodiment, said sequencingcomprises generating about 30 bp, about 40 bp, about 50 bp, about 60 bp,about 70 bp, about 80 bp, about 90 bp, about 100 bp, about 110, or about120 bp per read.

In another embodiment, said sample is taken when the subject is at aflare state of an autoimmune disease. In another embodiment, said sampleis taken from a subject having or suspected of having systemic lupuserythematosus.

In another aspect, a method for determining one or more correlatingclonotypes in a subject is provided comprising: generating one or moreclonotype profiles by nucleic acid sequencing individual, spatiallyisolated molecules from at least one sample from the subject, whereinthe at least one sample is related to a first state of the disease, anddetermining one or more correlating clonotypes in the subject based onthe one or more clonotype profiles.

In one embodiment, said at least one sample is from a tissue affected bythe disease. In another embodiment, said determination of one or morecorrelating clonotypes comprises comparing clonotype profiles from atleast two samples.

In another embodiment, the first state of the disease is a peak state ofthe disease. In another embodiment, said one or more correlatingclonotypes are present in the peak state of the disease. In anotherembodiment, said one or more correlating clonotypes are absent in thepeak state of the disease. In another embodiment, said one or morecorrelating clonotypes are high in the peak state of the disease. Inanother embodiment, said one or more correlating clonotypes are low inthe peak state of the disease.

In another embodiment, said sample comprises T-cells and/or B-cells. Inanother embodiment, said T-cells and/or B-cells comprise a subset ofT-cells and/or B-cells. In another embodiment, said subset of T-cellsand/or B-cells are enriched by interaction with a marker. In anotherembodiment, said marker is a cell surface marker on the subset ofT-cells and/or B-cells. In another embodiment, said subset of T-cellsand/or B-cells interact with an antigen specifically present in thedisease.

In another embodiment, the disease is systemic lupus erythematosus ormultiple sclerosis.

In another aspect, a method for developing an algorithm that can predictone or more correlating clonotypes in any sample from a subject with adisease is provided comprising: a) generating a plurality of clonotypeprofiles from a set of samples, wherein the samples are relevant to thedisease, b) identifying one or more correlating clonotypes from the setof samples, c) using sequence parameters and/or functional data from oneor more correlating clonotypes identified in b) to develop the algorithmthat can predict correlating clonotypes in any sample from a subjectwith the disease.

In one embodiment, the set of samples are taken from one or more tissuesaffected by the disease.

In another embodiment, said identification of one or more correlatingclonotypes comprises comparing clonotype profiles from at least twosamples.

In another embodiment, said functional data include binding ability ofmarkers on T-cell and/or B-cell surface or interaction with antigen by aT-cell or B-cell.

In another embodiment, said sequence parameters comprise nucleic acidsequence and predicted amino acid sequence.

In another embodiment, the samples are from one or more individuals at apeak stage of the disease. In another embodiment, said one or morecorrelating clonotypes are present in the peak state of the disease. Inanother embodiment, said one or more correlating clonotypes are at ahigh level in the peak state of the disease. In another embodiment, saidone or more correlating clonotypes are at a low level in the peak stateof the disease. In another embodiment, the one or more correlatingclonotypes are absent at the peak state of the disease. In anotherembodiment, the disease is systemic lupus erythematosus or multiplesclerosis.

In another embodiment, a method for discovering one or more correlatingclonotypes for an individual is provided, comprising inputting aclonotype profile from a sample from the individual into an algorithm,and using the algorithm to determine one or more correlating clonotypesfor the individual. In one embodiment, the algorithm is an algorithmthat can predict one or more correlating clonotypes in any sample from asubject with a disease is provided comprising, said algorithm beingdeveloped by: a) generating a plurality of clonotype profiles from a setof samples, wherein the samples are relevant to the disease, b)identifying one or more correlating clonotypes from the set of samples,c) using sequence parameters and/or functional data from one or morecorrelating clonotypes identified in b) to develop an algorithm that canpredict correlating clonotypes in any sample from a subject with thedisease.

In one embodiment, said sample is at taken at a peak state of disease.In another embodiment, the sample is taken from disease affected tissue.

In another aspect, a method for generating an algorithm that calculatesa disease activity score is provided comprising: developing an algorithmthat uses a set of factors to combine levels of correlating clonotypesinto a disease activity score, comparing the disease activity score toclinical data regarding the disease state, and optimizing the factors inorder to maximize the correlation between clinical data and the diseaseactivity score.

In one embodiment, method for monitoring the disease state of anindividual is provided comprising: a) determining a clonotype profilefrom a sample from the individual, b) inputting the clonotype profileinformation from a) into an algorithm that calculates a disease activityscore, wherein is algorithm is generated by developing an algorithm thatuses a set of factors to combine levels of correlating clonotypes into adisease activity score, comparing the disease activity score to clinicaldata regarding the disease state, and optimizing the factors in order tomaximize the correlation between clinical data and the disease activityscore, and c) using the algorithm that calculates a disease activityscore to generate a score predictive of the disease state of theindividual.

In another embodiment, the method for monitoring the disease state of anindividual further comprises determining one or more correlatingclonotypes in the individual, and inputting information the one or morecorrelating clonotypes into the algorithm.

In another embodiment, said determining one or more correlatingclonotypes in the individual comprises a) generating one or moreclonotype profiles by nucleic acid sequencing individual, spatiallyisolated molecules from at least one sample from the subject, whereinthe at least one sample is related to a first state of the disease, andb) determining one or more correlating clonotypes in the subject basedon the one or more clonotype profiles.

In another embodiment, said determining one or more correlatingclonotypes in the individual comprises a) inputting a clonotype profilefrom a sample from the individual into an algorithm that can predict oneor more correlating clonotypes, wherein said algorithm that can predictone or more correlating clonotypes is developed by i) generating aplurality of clonotype profiles from a set of samples, wherein thesamples are relevant to the disease, ii) identifying one or morecorrelating clonotypes from the set of samples, iii) using sequenceparameters and/or functional data from one or more correlatingclonotypes identified in ii) to develop the algorithm that can predictcorrelating clonotypes in any sample from a subject with the disease,and c) using the algorithm that can predict one or more correlatingclonotypes to determine one or more correlating clonotypes for theindividual.

In another embodiment, the disease is systemic lupus erythematosus ormultiple sclerosis.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1 is a flow diagram of an embodiment of a method of the providedinvention for determining clonotype profiles.

FIG. 2 shows a PCR scheme for amplifying TCRβ genes.

FIG. 3 illustrates a PCR product to be sequenced that was amplifiedusing the scheme in FIG. 2.

FIGS. 4A and 4B illustrate a PCR scheme for amplifying isotypesequences.

FIG. 5 illustrates the reproducibility of multiplexed amplifications.

FIG. 6 illustrates that multiplexed amplifications have minimalamplification bias.

FIG. 7 illustrates agarose gel electrophoresis of multiplexedamplification of IgH sequences.

FIG. 8A shows the log₁₀ of the frequency of each clonotype in the twoduplicate samples using Accuprime and cDNA corresponding to 500 ng ofRNA as input template.

FIG. 8B depicts the log₁₀ of the frequency of each clonotype using cDNAcorresponding to 500 ng of RNA as input template and Accuprime (X axis)or High fidelity Taq (Y axis).

FIG. 8C shows the log₁₀ of the frequency of each clonotype using cDNAcorresponding to 50 ng of RNA as input template and Accuprime (X axis)or High fidelity Taq (Y axis).

FIG. 9 illustrates one embodiment of a scheme for linking two sequencesto form one amplicon during an amplification reaction. Information onthe presence of these two sequences in the same sample (e.g., cell) canthen be preserved even if they are mixed with a pool of sequences fromother samples.

FIG. 10 illustrates another embodiment of an amplification scheme forlinking two sequences.

FIG. 11 illustrates another embodiment of an amplification scheme forlinking two sequences.

FIGS. 12A and 12B illustrate a scheme for multiplexing a reactionlinking two sequences by PCR.

FIGS. 13A-13D illustrate a scheme for linking three sequences together.

FIG. 14 illustrates a flow diagram for discovering correlatingclonotypes using a calibration test.

FIG. 15 illustrates a flow diagram for discovering correlatingclonotypes using a population study.

FIG. 16 illustrates a flow diagram for discovering correlatingclonotypes using a population study and a calibration test.

FIG. 17 illustrates algorithms that can predict correlating clonotypesin a sample.

FIG. 18 illustrates a flow diagram for generating a monitoring algorithmfor calculating Immune Load.

FIG. 19 illustrates a flow diagram for performing a monitoring testwithout a calibration test.

FIG. 20 illustrates a flow diagram for performing a monitoring testusing a calibration test.

DETAILED DESCRIPTION OF THE INVENTION Overview

In general, the provided invention includes methods for applying nucleicacid sequencing techniques to the task of monitoring the repertoire ofadaptive immunity cells for profiling the immune system. The profiles ofthe immune system generated can be used for diagnosis of diseases anddisorders, and for diagnosis of states of diseases and disorders. Themethods of immune profiling of the provided invention can be used inmonitoring diseases and disorders and assessing treatment of diseasesand disorders. The diseases and disorders that the methods of theprovided invention can be applied to include autoimmune disease,including systemic lupus erythematosus (SLE), multiple sclerosis (MS),rheumatoid arthritis (RA), and ankylosing spondylitis. The methods ofthe provided invention can be applied to the diagnosis, monitoring, andtreatment of transplant rejection and immune aging. Furthermore, themethods of immune profiling of the provided invention can be used fordiagnosing, monitoring, and treating other diseases related to theimmune system, including cancer and infectious disease.

Sequencing individual amplified molecules can distinguish differentsequences and hence has the sensitivity to detect quantitative changesin clonal expansion. In general, in one embodiment of the providedinvention, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided. The method can comprisesteps including isolating samples from a subject, one or more rounds ofnucleic acid amplification, spatially isolating individual nucleicacids, and sequencing nucleic acids. The nucleic acids can be DNA orRNA. The recombined DNA sequences in T-cells and/or B-cells can betermed clonotypes.

In one aspect, a method for determining one or more correlatingclonotypes in a subject or individual is provided. In another aspect, amethod for developing an algorithm that can predict one or morecorrelating clonotypes in any sample from a subject with a disease isprovided. In another aspect, a method for discovering one or morecorrelating clonotypes for an individual using an algorithm that canpredict one or more correlating clonotypes in any sample from a subjectis provided. In another aspect, a method for generating an algorithmthat calculates a disease activity score is provided. In another aspect,a method for monitoring the disease state of an individual is provided.

I. Methods of Determining Clonotype Profiles

A. Overview

The methods of the provided invention can be used to generate profilesof recombined DNA sequences, or clonotypes, in sample from a subject.

In one embodiment, a method for determining a profile of recombined DNAsequences in T-cells and/or B-cells is provided including obtaining asample from a subject comprising T-cells and/or B-cells, isolatingindividual molecules of genomic DNA from said cells, sequencing theisolated individual molecules of genomic DNA, and determining the levelsof different sequences from the sample to generate said profile ofrecombined DNA sequences.

In another embodiment, a method for determining a profile of recombinedDNA sequences in T-cells and/or B-cells is provided including obtaininga sample from a subject comprising T-cells and/or B-cells, isolatingindividual molecules of genomic DNA from the cells, amplifying theindividual molecules of genomic DNA, sequencing the amplified DNA, anddetermining the levels of different sequences from the sample togenerate said profile of recombined DNA sequences.

In another embodiment, a method for determining a profile of recombinedDNA sequences in T-cells and/or B-cells is provided including obtaininga sample from a subject comprising T-cells and/or B-cells, amplifyinggenomic DNA from the cells, isolating individual molecules of theamplified DNA, sequencing the isolated individual molecules of amplifiedDNA, and determining the levels of different sequences from the sampleto generate the profile of recombined DNA sequences.

In another embodiment, a method for determining a profile of recombinedDNA sequences in T-cells and/or B-cells is provided including obtaininga sample from a subject including T-cells and/or B-cells, amplifyinggenomic DNA from the cells, isolating individual molecules of theamplified DNA, re-amplifying the amplified DNA molecules, sequencing there-amplified DNA molecules, and determining the levels of differentsequences from the sample to generate the profile of recombined DNAsequences.

In another embodiment, a method for determining a profile of sequencesof recombined DNA in T-cells and/or B-cells is provided includingobtaining a sample from a subject comprising T-cells and/or B-cells,isolating RNA from said sample, reverse transcribing the RNA from saidcells to form cDNA, isolating individual molecules of said cDNA,optionally re-amplifying said cDNA, sequencing said isolated individualmolecules of said cDNA or re-amplified DNA, and determining the levelsof different sequences from said sample to generate said profile ofrecombined DNA sequences.

In another embodiment, a method for determining a profile of sequencesof recombined DNA in T-cells and/or B-cells is provided includingobtaining a sample from a subject including T-cells and/or B-cells,isolating individual molecules of RNA from said sample, sequencing theindividual molecules of RNA, and determining the levels of differentsequences from said sample to generate the profile of recombined DNAsequences.

B. Subjects and Samples

1. Subjects

The methods of the provided invention can use samples from subjects orindividuals (e.g., patients). The subject can be a patient, for example,a patient with an autoimmune disease. The subject can be a patient withan infectious disease or cancer. The subject can be a mammal, forexample, a human. The subject can be male or female. The subject can bean infant, a child, or an adult.

2. Samples

Samples used in the methods of the provided invention can include, forexample, a bodily fluid from a subject, including amniotic fluidsurrounding a fetus, aqueous humor, bile, blood and blood plasma,cerumen (earwax), Cowper's fluid or pre-ejaculatory fluid, chyle, chyme,female ejaculate, interstitial fluid, lymph, menses, breast milk, mucus(including snot and phlegm), pleural fluid, pus, saliva, sebum (skinoil), semen, serum, sweat, tears, urine, vaginal lubrication, vomit,water, feces, internal body fluids, including cerebrospinal fluidsurrounding the brain and the spinal cord, synovial fluid surroundingbone joints, intracellular fluid is the fluid inside cells, and vitreoushumour the fluids in the eyeball. In one embodiment, the sample is ablood sample. The blood sample can be about 0.1, 0.2, 0.3, 0.4, 0.5,0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, or 5.0 mL.The sample can be Cerebral Spinal Fluid (CSF) when the subject hasmultiple sclerosis, synovial fluid when the subject has rheumatoidarthritis, and skin (or other organ) biopsy when the subject hassystemic lupus. In one embodiment, the clonotpe can be identified fromthe available body fluid/tissue most likely to reflect pathologyfollowed by later monitoring the levels of the clonotypes form adifferent body fluid, for example, blood.

Samples can be analyzed at a time when the disease is inactive.

The sample can be obtained by a health care provider, for example, aphysician, physician assistant, nurse, veterinarian, dermatologist,rheumatologist, dentist, paramedic, or surgeon. The sample can beobtained by a research technician. More than one sample from a subjectcan be obtained.

The sample can be a biopsy, e.g., a skin biopsy. The biopsy can be from,for example, brain, liver, lung, heart, colon, kidney, or bone marrow.Any biopsy technique used by those skilled in the art can be used forisolating a sample from a subject. For example, a biopsy can be an openbiopsy, in which general anesthesia is used. The biopsy can be a closedbiopsy, in which a smaller cut is made than in an open biopsy. Thebiopsy can be a core or incisional biopsy, in which part of the tissueis removed. The biopsy can be an excisional biopsy, in which attempts toremove an entire lesion are made. The biopsy can be a fine needleaspiration biopsy, in which a sample of tissue or fluid is removed witha needle.

The sample can include immune cells. The immune cells can includeT-cells and/or B-cells. T-cells (T lymphocytes) include, for example,cells that express T cell receptors. T-cells include Helper T cells(effector T cells or Th cells), cytotoxic T cells (CTLs), memory Tcells, and regulatory T cells. The sample can include a single cell insome applications (e.g., a calibration test to define relevant T cells)or more generally at least 1,000, at least 10,000, at least 100,000, atleast 250,000, at least 500,000, at least 750,000, or at least 1,000,000T-cells.

B-cells include, for example, plasma B cells, memory B cells, B1 cells,B2 cells, marginal-zone B cells, and follicular B cells. B-cells canexpress immunoglobulins (antibodies, B cell receptor). The sample caninclude a single cell in some applications (e.g., a calibration test todefine relevant B cells) or more generally at least 1,000, at least10,000, at least 100,000, at least 250,000, at least 500,000, at least750,000, or at least 1,000,000 B-cells.

The sample can include nucleic acid, for example, DNA (e.g., genomic DNAor mitochondrial DNA) or RNA (e.g., messenger RNA or microRNA). Thenucleic acid can be cell-free DNA or RNA. In the methods of the providedinvention, the amount of RNA or DNA from a subject that can be analyzedincludes, for example, as low as a single cell in some applications(e.g., a calibration test) and as many as 10 millions of cells or moretranslating to a range of DNA of 6 pg-60 ug, and RNA of approximately 1pg-10 ug.

C. Means for Isolating, Amplifying and Re-Amplifying Nucleic Acid

1. Characteristics of TCR and BCR Genes

Since the identifying recombinations are present in the DNA of eachindividual adaptive immunity cell as well as their associated RNAtranscripts, either RNA or DNA can be sequenced in the methods of theprovided invention. A recombined sequence from a T-cell or B-cell canalso be referred to as a clonotype. The DNA or RNA can correspond tosequences from T-cell receptor (TCR) genes or immunoglobulin (Ig) genesthat encode antibodies. For example, the DNA and RNA can correspond tosequences encoding α, β, γ, or δ chains of a TCR. In a majority ofT-cells, the TCR is a heterodimer consisting of an α-chain and β-chain.The TCRα chain is generated by VJ recombination, and the β chainreceptor is generated by V(D)J recombination. For the TCRβ chain, inhumans there are 48 V segments, 2 D segments, and 13 J segments. Severalbases may be deleted and others added (called N and P nucleotides) ateach of the two junctions. In a minority of T-cells, the TCRs consist ofγ and δ delta chains. The TCR γ chain is generated by VJ recombination,and the TCR δ chain is generated by V(D)J recombination (Kenneth Murphy,Paul Travers, and Mark Walport, Janeway's Immunology 7th edition,Garland Science, 2007, which is herein incorporated by reference in itsentirety).

The DNA and RNA analyzed in the methods of the provided invention cancorrespond to sequences encoding heavy chain immunoglobulins (IgH) withconstant regions (α, δ, ε, γ, or μ) or light chain immunoglobulins (IgKor IgL) with constant regions λ or κ. Each antibody has two identicallight chains and two identical heavy chains. Each chain is composed of aconstant (C) and a variable region. For the heavy chain, the variableregion is composed of a variable (V), diversity (D), and joining (J)segments. Several distinct sequences coding for each type of thesesegments are present in the genome. A specific VDJ recombination eventoccurs during the development of a B-cell, marking that cell to generatea specific heavy chain. Diversity in the light chain is generated in asimilar fashion except that there is no D region so there is only VJrecombination. Somatic mutation often occurs close to the site of therecombination, causing the addition or deletion of several nucleotides,further increasing the diversity of heavy and light chains generated byB-cells. The possible diversity of the antibodies generated by a B-cellis then the product of the different heavy and light chains. Thevariable regions of the heavy and light chains contribute to form theantigen recognition (or binding) region or site. Added to this diversityis a process of somatic hypermutation which can occur after a specificresponse is mounted against some epitope. In this process mutationsoccur in those B-cells that are able to recognize the specific epitopeleading to greater diversity in antibodies that may be able to bind thespecific epitope more strongly. All these factors contribute to greatdiversity of antibodies generated by the B-cells. Many billions andmaybe more than a trillion distinct antibodies may be generated. Thebasic premise for generating T-cell diversity is similar to that forgenerating antibodies by B-cells. An element of T-cell and B-cellactivation is their binding to foreign epitopes. The activation of aspecific cell leads to the production of more of the same type of cellsleading to a clonal expansion.

Complementarity determining regions (CDR), or hypervariable regions, aresequences in the variable domains of antigen receptors (e.g., T cellreceptor and immunoglobulin) that can complement an antigen. The chainof each antigen receptor contains three CDRs (CDR1, CDR2, and CDR3). Thetwo polypeptides making T cells (α and β) and immunoglobulin (IgH andIgK or IgL) contribute to the formation of the three CDRs.

The part of CDR1 and CDR2 that is coded for by TCRβ lies within one of47 functional V segments. Most of the diversity of CDRs is found inCDR3, with the diversity being generated by somatic recombination eventsduring the development of T lymphocytes.

A great diversity of BCR is present inter and intra-individuals. The BCRis composed of two genes IgH and IgK (or IgL) coding for antibody heavyand light chains. Three Complementarity Determining Region (CDR)sequences that bind antigens and MHC molecules have the most diversityin IgH and IgK (or IgL). The part of CDR1 and CDR2 coded for by IgH lieswithin one of 44 functional V segments. Most of the diversity in naïve Bcells emerges in the generation of CDR3 through somatic recombinationevents during the development of B lymphocytes. The recombination cangenerate a molecule with one of each of the V, D, and J segments. Inhumans, there are 44 V, 27 D, and 6 J segments; thus, there is atheoretical possibility of more than 7,000 combinations. In a smallfraction of BCRs (˜5%) two D segments are found. Furthermore, severalbases may be deleted and others added (called N and P nucleotides) ateach of the two junctions generating a great degree of diversity. AfterB cell activation a process of affinity maturation through somatichypermutation occurs. In this process progeny cells of the activated Bcells accumulate distinct somatic mutations throughout the gene withhigher mutation concentration in the CDR regions leading to generatingantibodies with higher affinity to the antigens. Therefore multipleprimers can be utilized to amplify sequences after somatichypermutation. In addition to somatic hypermutation activated B cellsundergo the process of isotype switching. Antibodies with the samevariable segments can have different forms (isotypes) depending on theconstant segment. Whereas all naïve B cells express IgM (or IgD),activated B cells mostly express IgG but also IgM, IgA and IgE. Thisexpression switching from IgM (and/or IgD) to IgG, IgA, or IgE occursthrough a recombination event causing one cell to specialize inproducing a specific isotype. There is one segment for each IgM, IgD,and IgE, two segments for IgA, and four segments for IgG.

2. Amplification Reactions

Polymerase chain reaction (PCR) can be used to amplify the relevantregions from a collection of cells. Transcription Mediated Amplification(TMA) can be used to produce RNA amplicons from a target nucleic acid.The nucleic acid from each cell can be analyzed separately, as each cellwill carry its own unique signature.

TCRβ or immunoglobulin sequences can be amplified from nucleic acid in amultiplex reaction using at least one primer that anneals to the Cregion and one or more primers that can anneal to one or more V segments(FIG. 2 and FIG. 4). The number of primers that anneal to V segments ina multiplex reaction can be, for example, at least 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,or 80. The number of primers that anneal to V segments in a multiplexreaction can be, for example, 10-60, 20-50, 30-50, 40-50, 20-40, 30-40,or 35-40. The primers can anneal to different V segments. For IgH genes,because of the possibility of somatic mutations in the V segments,multiple primers that anneal to each V segment can be used; for example,1, 2, 3, 4, or 5 primers per V segment. The number of primers thatanneal to C segments in a multiplex reaction can include, for example,at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15. Thenumber of primers that anneal to C segments in a multiplex reaction canbe 1-10, 2-9, 3-8, 4-7, 3-8, or 3-6. Amplification of TCR orimmunoglobulin genes can occur as described in Example 3 and/or Example4.

The region to be amplified can include the full clonal sequence or asubset of the clonal sequence, including the V-D junction, D-J junctionof an immunoglobulin or T-cell receptor gene, the full variable regionof an immunoglobulin or T-cell receptor gene, the antigen recognitionregion, or a CDR, e.g., complementarity determining region 3 (CDR3).

The TCR or immunoglobulin sequence can amplified using a primary and asecondary amplification step. Each of the different amplification stepscan comprise different primers. The different primers can introducesequence not originally present in the immune gene sequence. Forexample, the amplification procedure can add one or more tags to the 5′and/or 3′ end of amplified TCR or immunoglobulin sequence (FIG. 3). Thetag can be sequence that facilitates subsequent sequencing of theamplified DNA. The tag can be sequence that facilitates binding theamplified sequence to a solid support.

Other methods for amplification may not employ any primers in the Vregion. Instead, a specific primer can be used from the C segment and ageneric primer can be put in the other side (5′). The generic primer canbe appended in the cDNA synthesis through different methods includingthe well described methods of strand switching. Similarly, the genericprimer can be appended after cDNA making through different methodsincluding ligation.

Other means of amplifying nucleic acid that can be used in the methodsof the provided invention include, for example, reversetranscription-PCR, real-time PCR, quantitative real-time PCR, digitalPCR (dPCR), digital emulsion PCR (dePCR), clonal PCR, amplified fragmentlength polymorphism PCR (AFLP PCR), allele specific PCR, assembly PCR,asymmetric PCR (in which a great excess of primers for a chosen strandis used), colony PCR, helicase-dependent amplification (HDA), Hot StartPCR, inverse PCR (IPCR), in situ PCR, long PCR (extension of DNA greaterthan about 5 kilobases), multiplex PCR, nested PCR (uses more than onepair of primers), single-cell PCR, touchdown PCR, loop-mediatedisothermal PCR (LAMP), and nucleic acid sequence based amplification(NASBA). Other amplification schemes include: Ligase Chain Reaction,Branch DNA Amplification, Rolling Circle Amplification, Circle to CircleAmplification, SPIA amplification, Target Amplification by Capture andLigation (TACL) amplification, and RACE amplification.

The information in RNA in a sample can be converted to cDNA by usingreverse transcription. PolyA primers, random primers, and/or genespecific primers can be used in reverse transcription reactions.

After amplification of DNA from the genome (or amplification of nucleicacid in the form of cDNA by reverse transcribing RNA), the individualnucleic acid molecules can be isolated, optionally re-amplified, andthen sequenced individually.

Polymerases that can be used for amplification in the methods of theprovided invention include, for example, Taq polymerase, AccuPrimepolymerase, or Pfu. The choice of polymerase to use can be based onwhether fidelity or efficiency is preferred.

In one embodiment, individual cells in a sample are isolated. Two ormore sequences from each isolated cell can be linked together. Forexample, sequences from TCRα and TCRβ genes or IgH and IgK genes from anindividual cell can be linked, for example by an amplification scheme(FIGS. 9-13) or a ligation scheme. The linked TCRα and TCRβ or IgH andIgK sequences for isolated cells can optionally be reamplified. Thelinked amplification products can be optionally repooled afteramplification.

3. Means of Isolating Individual Nucleic Acids

Methods for isolation of nucleic acids from a pool include subcloningnucleic acid into DNA vectors and transforming bacteria (bacterialcloning), spatial separation of the molecules in two dimensions on asolid substrate (e.g., glass slide), spatial separation of the moleculesin three dimensions in a solution within micelles (such as can beachieved using oil emulsions with or without immobilizing the moleculeson a solid surface such as beads), or using microreaction chambers in,for example, microfluidic or nano-fluidic chips. Dilution can be used toensure that on average a single molecule is present in a given volume,spatial region, bead, or reaction chamber.

Real time PCR, picogreen staining, nanofluidic electrophoresis (e.g.LabChip) or UV absorption measurements can be used in an initial step tojudge the functional amount of amplifiable material.

Methods for re-amplification of nucleic acids include bacterial growthof isolated colonies transformed with nucleic acid, amplification on aslide (e.g., PCR colonies (polonies)), and amplification on a bead. Thesame method can be use to amplify and re-amplify the nucleic acid or adifferent method can be used to amplify and reamplify the nucleic acid.

In certain embodiments the subcloning steps include a step in which acommon primer is attached to the DNA or RNA through an amplification orligation step. This primer is then used to amplify the clones and as arecognition sequence for hybridization of a primer for sequencing (FIG.2).

In other embodiments, nucleic acids are analyzed from a subset of cells.A method to separate cells, for example by using a cell surface marker,can be employed. For example, cells can be isolated by cell sortingflow-cytometry, flow-sorting, fluorescent activated cell sorting (FACS),bead based separation such as magnetic cell sorting (MACS; e.g., usingantibody coated magnetic particles), size-based separation (e.g., asieve, an array of obstacles, or a filter), sorting in a microfluidicsdevice, antibody-based separation, sedimentation, affinity adsorption,affinity extraction, or density gradient centrifugation. Cells can bepurified by laser capture microdissection. Sorting can be based on cellsize, morphology, or intracellular or extracellular markers. Methods forisolating or sorting tumor cells are described, for example, in NagrathS. et al. (2007) Nature 450:1235-1239; U.S. Pat. Nos. 6,008,002,7,232,653, and 7,332,288; PCT Publication No. WO2008157220A1; and USPatent Application Nos. US20080138805A1 and US20090186065; and RosenbergR. et al. (2002) Cytometry 49:150-158, each of which is hereinincorporated by reference in their entireties.

The subset of cells can be a subset of T-cells and/or B-cells. Thesubset of T cells can be CD4+, CD8+, or CD27^(high) cells.

Fluorescence-activated cell sorting (FACS) uses light scattering andfluorescent characteristics to sort cells. A fluorescent property can beimparted on a cell using, e.g., nucleic acid probes or antibodiesconjugated to a fluorescent dye. A cell suspension can form a stream offlowing liquid. The stream of cells forms drops that containapproximately one cell per drop. Before the stream forms drops, afluorescent characteristic of each cell is measured. A charge is placedon an electrical charging ring prior to fluorescence intensitymeasurement and the opposite charge is carried on the drop as it breaksfrom the stream. The charged drops pass through two high voltagedeflection plates that divert drops into different containers based upontheir charge. The charge can be directly applied to the stream and thedrop breaking off retains the charge of the same sign as the stream. Thestream is then returned to neutral after the drop breaks off.

Direct or indirect immunofluorescence can be used in FACS. In directimmunofluorescence, an antibody is directly conjugated to a fluorescentdye. In indirect immunofluorescence, the primary antibody is notlabeled, and a secondary antibody is conjugated to a fluorescent dye.

In one embodiment, individual cells from a sample can be isolated.Sequence information from two more genes in a cell can be linkedtogether. For example, a sample can be from a patient with an autoimmunedisease, and sequence information from TCRα and TCRβ genes fromspatially isolated cells from the sample can be physically linked by,for example, an amplification scheme or a ligation scheme. The linkedTCRα and TCRβ sequences can optionally be amplified and/or pooled withlinked sequences from other cells. The linked sequences canalternatively be for IgH and IgK or for IgH and IgL

C. Sequencing Techniques

Any technique for sequencing nucleic acid known to those skilled in theart can be used in the methods of the provided invention. DNA sequencingtechniques include classic dideoxy sequencing reactions (Sanger method)using labeled terminators or primers and gel separation in slab orcapillary, sequencing by synthesis using reversibly terminated labelednucleotides, pyrosequencing, 454 sequencing, allele specifichybridization to a library of labeled oligonucleotide probes, sequencingby synthesis using allele specific hybridization to a library of labeledclones that is followed by ligation, real time monitoring of theincorporation of labeled nucleotides during a polymerization step,polony sequencing, and SOLiD sequencing. Sequencing of the separatedmolecules has more recently been demonstrated by sequential or singleextension reactions using polymerases or ligases as well as by single orsequential differential hybridizations with libraries of probes. Thesereactions have been performed on many clonal sequences in parallelincluding demonstrations in current commercial applications of over 100million sequences in parallel. These sequencing approaches can thus beused to study the repertoire of T-cell receptor (TCR) and/or B-cellreceptor (BCR).

The sequencing technique used in the methods of the provided inventioncan generate least 1000 reads per run, at least 10,000 reads per run, atleast 100,000 reads per run, at least 500,000 reads per run, or at least1,000,000 reads per run.

The sequencing technique used in the methods of the provided inventioncan generate about 30 bp, about 40 bp, about 50 bp, about 60 bp, about70 bp, about 80 bp, about 90 bp, about 100 bp, about 110, about 120 bpper read, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about350 bp, about 400 bp, about 450 bp, about 500 bp, about 550 bp, or about600 bp per read.

The sequencing technique used in the methods of the provided inventioncan generate at least 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 150,200, 250, 300, 350, 400, 450, 500, 550, or 600 bp per read.

1. True Single Molecule Sequencing

A sequencing technique that can be used in the methods of the providedinvention includes, for example, Helicos True Single Molecule Sequencing(tSMS) (Harris T. D. et al. (2008) Science 320:106-109). In the tSMStechnique, a DNA sample is cleaved into strands of approximately 100 to200 nucleotides, and a polyA sequence is added to the 3′ end of each DNAstrand. Each strand is labeled by the addition of a fluorescentlylabeled adenosine nucleotide. The DNA strands are then hybridized to aflow cell, which contains millions of oligo-T capture sites that areimmobilized to the flow cell surface. The templates can be at a densityof about 100 million templates/cm². The flow cell is then loaded into aninstrument, e.g., HeliScope™ sequencer, and a laser illuminates thesurface of the flow cell, revealing the position of each template. A CCDcamera can map the position of the templates on the flow cell surface.The template fluorescent label is then cleaved and washed away. Thesequencing reaction begins by introducing a DNA polymerase and afluorescently labeled nucleotide. The oligo-T nucleic acid serves as aprimer. The polymerase incorporates the labeled nucleotides to theprimer in a template directed manner. The polymerase and unincorporatednucleotides are removed. The templates that have directed incorporationof the fluorescently labeled nucleotide are detected by imaging the flowcell surface. After imaging, a cleavage step removes the fluorescentlabel, and the process is repeated with other fluorescently labelednucleotides until the desired read length is achieved. Sequenceinformation is collected with each nucleotide addition step.

2. 454 Sequencing

Another example of a DNA sequencing technique that can be used in themethods of the provided invention is 454 sequencing (Roche) (Margulies,M et al. 2005, Nature, 437, 376-380). 454 sequencing involves two steps.In the first step, DNA is sheared into fragments of approximately300-800 base pairs, and the fragments are blunt ended. Oligonucleotideadaptors are then ligated to the ends of the fragments. The adaptorsserve as primers for amplification and sequencing of the fragments. Thefragments can be attached to DNA capture beads, e.g.,streptavidin-coated beads using, e.g., Adaptor B, which contains5′-biotin tag. The fragments attached to the beads are PCR amplifiedwithin droplets of an oil-water emulsion. The result is multiple copiesof clonally amplified DNA fragments on each bead. In the second step,the beads are captured in wells (pico-liter sized). Pyrosequencing isperformed on each DNA fragment in parallel. Addition of one or morenucleotides generates a light signal that is recorded by a CCD camera ina sequencing instrument. The signal strength is proportional to thenumber of nucleotides incorporated.

Pyrosequencing makes use of pyrophosphate (PPi) which is released uponnucleotide addition. PPi is converted to ATP by ATP sulfurylase in thepresence of adenosine 5′ phosphosulfate. Luciferase uses ATP to convertluciferin to oxyluciferin, and this reaction generates light that isdetected and analyzed.

3. SOLiD Sequencing

Another example of a DNA sequencing technique that can be used in themethods of the provided invention is SOLiD technology (AppliedBiosystems). In SOLiD sequencing, genomic DNA is sheared into fragments,and adaptors are attached to the 5′ and 3′ ends of the fragments togenerate a fragment library. Alternatively, internal adaptors can beintroduced by ligating adaptors to the 5′ and 3′ ends of the fragments,circularizing the fragments, digesting the circularized fragment togenerate an internal adaptor, and attaching adaptors to the 5′ and 3′ends of the resulting fragments to generate a mate-paired library. Next,clonal bead populations are prepared in microreactors containing beads,primers, template, and PCR components. Following PCR, the templates aredenatured and beads are enriched to separate the beads with extendedtemplates. Templates on the selected beads are subjected to a 3′modification that permits bonding to a glass slide.

The sequence can be determined by sequential hybridization and ligationof partially random oligonucleotides with a central determined base (orpair of bases) that is identified by a specific fluorophore. After acolor is recorded, the ligated oligonucleotide is cleaved and removedand the process is then repeated.

4. SOLEXA Sequencing

Another example of a sequencing technology that can be used in themethods of the provided invention is SOLEXA sequencing (Illumina).SOLEXA sequencing is based on the amplification of DNA on a solidsurface using fold-back PCR and anchored primers. Genomic DNA isfragmented, and adapters are added to the 5′ and 3′ ends of thefragments. DNA fragments that are attached to the surface of flow cellchannels are extended and bridge amplified. The fragments become doublestranded, and the double stranded molecules are denatured. Multiplecycles of the solid-phase amplification followed by denaturation cancreate several million clusters of approximately 1,000 copies ofsingle-stranded DNA molecules of the same template in each channel ofthe flow cell. Primers, DNA polymerase and four fluorophore-labeled,reversibly terminating nucleotides are used to perform sequentialsequencing. After nucleotide incorporation, a laser is used to excitethe fluorophores, and an image is captured and the identity of the firstbase is recorded. The 3′ terminators and fluorophores from eachincorporated base are removed and the incorporation, detection andidentification steps are repeated.

5. SMRT Sequencing

Another example of a sequencing technology that can be used in themethods of the provided invention includes the single molecule,real-time (SMRT™) technology of Pacific Biosciences. In SMRT, each ofthe four DNA bases is attached to one of four different fluorescentdyes. These dyes are phospholinked. A single DNA polymerase isimmobilized with a single molecule of template single stranded DNA atthe bottom of a zero-mode waveguide (ZMW). A ZMW is a confinementstructure which enables observation of incorporation of a singlenucleotide by DNA polymerase against the background of fluorescentnucleotides that rapidly diffuse in an out of the ZMW (in microseconds).It takes several milliseconds to incorporate a nucleotide into a growingstrand. During this time, the fluorescent label is excited and producesa fluorescent signal, and the fluorescent tag is cleaved off. Detectionof the corresponding fluorescence of the dye indicates which base wasincorporated. The process is repeated.

6. Nanopore Sequencing

Another example of a sequencing technique that can be used in themethods of the provided invention is nanopore sequencing (Soni G V andMeller A. (2007) Clin Chem 53: 1996-2001). A nanopore is a small hole,of the order of 1 nanometer in diameter. Immersion of a nanopore in aconducting fluid and application of a potential across it results in aslight electrical current due to conduction of ions through thenanopore. The amount of current which flows is sensitive to the size ofthe nanopore. As a DNA molecule passes through a nanopore, eachnucleotide on the DNA molecule obstructs the nanopore to a differentdegree. Thus, the change in the current passing through the nanopore asthe DNA molecule passes through the nanopore represents a reading of theDNA sequence.

7. Chemical-Sensitive Field Effect Transistor Array Sequencing

Another example of a sequencing technique that can be used in themethods of the provided invention involves using a chemical-sensitivefield effect transistor (chemFET) array to sequence DNA (for example, asdescribed in US Patent Application Publication No. 20090026082). In oneexample of the technique, DNA molecules can be placed into reactionchambers, and the template molecules can be hybridized to a sequencingprimer bound to a polymerase. Incorporation of one or more triphosphatesinto a new nucleic acid strand at the 3′ end of the sequencing primercan be detected by a change in current by a chemFET. An array can havemultiple chemFET sensors. In another example, single nucleic acids canbe attached to beads, and the nucleic acids can be amplified on thebead, and the individual beads can be transferred to individual reactionchambers on a chemFET array, with each chamber having a chemFET sensor,and the nucleic acids can be sequenced.

8. Sequencing with an Electron Microscope

Another example of a sequencing technique that can be used in themethods of the provided invention involves using a electron microscope(Moudrianakis E. N. and Beer M. Proc Natl Acad Sci USA. 1965 March;53:564-71). In one example of the technique, individual DNA moleculesare labeled using metallic labels that are distinguishable using anelectron microscope. These molecules are then stretched on a flatsurface and imaged using an electron microscope to measure sequences.

Any one of the sequencing techniques described herein can be used in themethods of the provided invention.

D. Methods for Sequencing the TCR and BCR Repertoire

Sequences can be read that originate from a single molecule or thatoriginate from amplifications from a single molecule. Millions ofindependent amplifications of single molecules can be performed inparallel either on a solid surface or in tiny compartments in water/oilemulsion. The DNA sample to be sequenced can be diluted and/or dispersedsufficiently to obtain one molecule in each compartment. This dilutioncan be followed by DNA amplification to generate copies of the originalDNA sequences and creating “clusters” of molecules all having the samesequence. These clusters can then be sequenced. Many millions of readscan be generated in one run. Sequence can be generated starting at the5′ end of a given strand of an amplified sequence and/or sequence can begenerated from starting from the 5′ end of the complementary sequence.In a preferred embodiment, sequence from strands is generated, i.e.paired end reads.

The prevalence of a particular sequence in the original DNA sequence canthen be measured by counting how many clusters carry that sequence. Moreprevalent sequences in the original sample lead to more compartments andmore clusters containing the specific sequences.

Methods can be used in the amplification schemes to ensure that thefrequency of the DNA sequences measured matches the frequency of the DNAsequence in the original sample. The methods can include ensuring thatPCR primer concentration are high enough to drive each hybridizationreaction to saturation in each cycle, adjusting individual primerconcentrations to minimize the differential amplification of differentsequences, etc.

Algorithms can be used to determine which sequences generated by thesequencer originate from the DNA sequence. Individually measuredsequences (reads) may be offset relative to each other, contain errorsintroduced by amplification and/or by sequencing. An algorithm can beused to combine reads together to more accurately determine thefrequency of a DNA sequence in the starting material.

A million sequencing reads or more for IgH and/or TCRβ originallyamplified from a blood sample comprising DNA or RNA can be obtained. Thenumber of reads for a specific IgH or TCRβ sequence relate to thefrequency of the specific clonotype in the blood sample. Therefore, thequantity of each of the clonotypes can be determined. If the pathogenicclonotypes for a particular patient are known, their level can bedetermined accurately through this sequencing approach.

In certain embodiments of the provided invention, a collection of DNAmolecules including a representation of the genomic DNA or reversetranscribed RNA from the TCR and BCR regions of immune cells from one ormore subjects is extracted and optionally amplified in such a way thateach molecule can be sequenced using one or more of the sequencingtechniques described above in order to be able to detect the presenceand frequency of sequences in a subject.

Different regions of immunoglobulin or T cell receptor genes can besequenced. In some embodiments, the full sequence of the variableregions can be sequenced to identify and quantify a clonotype.

A unique subset of the full clonal sequences can be sequenced. In someembodiments, nucleotides comprising the VD and the DJ junctions aresequenced to uniquely identify and quantify a clonotype. In otherembodiments, the fragment that can be sequenced is the full variableregion. In yet another embodiment, the antigen recognition region or thecomplementarity determining region 3 (CDR3) is sequenced. A fragmentcontaining the full CDR3 or the full variable region can be amplified toallow the sequencing of the CDR3 comprising parts of the V, D, and Jsegments.

One or more tags on amplified products can be used for sequencingimmunoglobulin or T cell receptor genes. One or more primers that annealto the tags can be used in the sequencing reactions. Different sectionsof an amplified molecule can be sequenced in separate reactions, and thesequencing results can be pieced together to generate a partial or afull sequence of the molecule.

In one embodiment, only the CDR3 is amplified and sequenced.Amplification and sequencing of the CDR3 can be accomplished by usingprimers specific to one or more V segment sequences (as well as one ormore primer(s) on the other side of the amplicon in the C segment).Primers for each of the V segments can be utilized in one or moreamplification reactions leading to the amplification of the fullrepertoire of sequences. This repertoire of sequences can then be mixedand subjected to separation, with or without amplification, andsequenced using any of the sequencing techniques described. When theamplification with the various V primers is done in separate tubes, thenumber of molecules carrying the different V segments can be“normalized” due to PCR saturation. For example, if one particular Vsegment had one or several clonal expansions leading to itsrepresentation more than other segments this information may be erasedor decreased since the PCR reaction for each segment can be driven tosaturation or close to it. Real time PCR can be used to quantify howmuch of each V segment is present. The full CDR3 can be sequenced, or asubset of the sequence CDR3 can be sequenced.

In one embodiment, only a subset of clonotypes is analyzed. This can beaccomplished by amplifying with a primer specific to the subset ofclonotypes, for example, a primer that is specific to the V segment.Unique clonotypes can be identified by sequencing with long contiguousreads that provide full connectivity. In some embodiments, when severalsequences of interest are present, a short read length across only oneof the junctions can generate degenerate tags that are not unique to aspecific clonotype but are shared among multiple clonotypes. For examplesequencing across the V/J junction can lump all the sequences with thesame V/J irrespective of the D segment as one clonotype. Information onthe full connectivity of all segments allows sequences to bedistinguished that may share the same V and J segments but are connectedto different D segments, for example.

The same analysis can be done when only the V and D are present (e.g.,the light chain of an antibody or the a subunit in TCR). The fulldiversity of TCR and BCR incorporates both subunits. However, it ispossible to do the analysis on the sequences of both subunits.

Errors generated by sequencing and/or by amplification can be taken intoaccount when generating the clonotype profile. For example, see Example5.

The initial amplification can be done from DNA or RNA (e.g., afterconversion to cDNA).

II. Methods for Determining Correlating Clonotypes, Disease ActivityScores, and Algorithms for Determining Either or Both

A. Correlating Versus Non-Correlating Clonotypes

The vast repertoire of T and B cell receptor sequences creates achallenge in finding individual cells that are correlated with specifichuman health outcomes. In many cases the sequences of clonotypes thatwill be of interest will be unique to the individual being studied. Themethods of the present invention provide means for distinguishing a)correlating clonotypes (which can be those clonotypes whose levelcorrelate with disease) from b) non-correlating clonotypes (which can bethose clonotypes whose levels do not correlate with disease). In oneembodiment, a correlating clonotype can display either positive ornegative correlation with disease. In another embodiment, a clonotypepresent at a peak state of a disease but not present at a non-peak stateof a disease can be a correlating clonotype (positive correlation withdisease). In another embodiment, a clonotype that is more abundant (i.e.is present at a higher level of molecules) in a peak state (or stage) ofa disease than at a non-peak state of the disease can be a correlatingclonotype (positive correlation with the disease). In anotherembodiment, a clonotype absent at a peak state of a disease but presentduring a non-peak state of the disease can be a correlating clonotype(negative correlation with disease). In another embodiment, a clonotypethat is less abundant at a peak state of a disease than at a non-peakstate of a disease can be a correlating clonotype (negative correlationwith disease). In another embodiment, a correlating clonotype for anindividual is determined by an algorithm.

B. Discovering Correlating and Non-Correlating Clonotypes Using aCalibration Test without a Population Study

In this embodiment of the invention, correlating clonotypes areidentified by looking at the clonotypes present in some sample that hasrelevance to a disease state (e.g., see FIG. 14). This state could beblood from a sample at a peak state of disease (e.g. a blood sample froman MS or lupus patient during an acute flare), or affected tissue thatis presumed to be enriched for T and B cells involved in the disease forthat individual. Examples of these tissues could be kidney biopsies oflupus patients with kidney inflammations, CSF in MS patients during aflare, synovial fluid for rheumatoid arthritis patients, or tumorsamples from cancer patients. In all of these examples, it is likelythat the tissues will contain relevant T and B cells that are related tothe disease (though not necessarily the causative agents). It is notablethat if this method is used to identify the clonotypes that are relevantto disease, they will only be relevant to the individual in whose samplethey were detected. As a result, a specific calibration test will beneeded in order to use this method to identify correlating clonotypes inany given individual with a disease.

In one embodiment, a method for determining one or more correlatingclonotypes in a subject is provided. The method can include steps for a)generating one or more clonotype profiles by nucleic acid sequencingindividual, spatially isolated molecules from at least one sample fromthe subject, wherein the at least one sample is related to a first stateof the disease, and b) determining one or more correlating clonotypes inthe subject based on the one or more clonotype profiles.

In one embodiment, at least one sample is from a tissue affected by thedisease. In another embodiment, said determination of one or morecorrelating clonotypes comprises comparing clonotype profiles from atleast two samples. In another embodiment, the first state of the diseaseis a peak state of the disease. In another embodiment, one or morecorrelating clonotypes are present in the peak state of the disease. Inanother embodiment, the one or more correlating clonotypes are absent inthe peak state of the disease. In another embodiment, one or morecorrelating clonotypes are high in the peak state of the disease. Inanother embodiment, one or more correlating clonotypes are low in thepeak state of the disease. In another embodiment, the sample comprisesT-cells and/or B-cells. In another embodiment, the T-cells and/orB-cells comprise a subset of T-cells and/or B-cells. In anotherembodiment, the subset of T-cells and/or B-cells are enriched byinteraction with a marker. In another embodiment, the marker is a cellsurface marker on the subset of T-cells and/or B-cells. In anotherembodiment, the subset of T-cells and/or B-cells interacts with anantigen specifically present in the disease.

In one embodiment, the disease is an autoimmune disease. In anotherembodiment, the autoimmune disease is systemic lupus erythematosus,multiple sclerosis, rheumatoid arthritis, or Ankylosing Spondylitis.

C. Discovering Correlating and Non-Correlating Clonotypes Using aPopulation Study

In one embodiment, a method is provided for identifying correlatingclonotypes using a population study (e.g., see FIG. 15). The utility ofthe population study is that it allows the specific information aboutcorrelating clonotypes that have been ascertained in individuals withknown disease state outcomes to be generalized to allow such correlatingclonotypes to be identified in all future subjects without the need fora calibration test. Knowledge of a specific set of correlatingclonotypes can be used to extract rules about the likely attributes(parameters) of clonotypes that will correlate in future subjects.

In one embodiment, the provided invention encompasses methods thatinclude identifying correlating and non correlating clonotypes bysequencing the immune cell repertoire in a study of samples frompatients with disease (s) and optionally healthy controls at differenttimes and, in the case of the patients with a disease, at different (andknown) states of the disease course characterized by clinical data. Thedisease can be, for example, an autoimmune disease. The clonotypes whoselevel is correlated with measures of disease in these different statescan be used to develop an algorithm that predicts the identity of alarger set of sequences that will correlate with disease as distinctfrom those that will not correlate with disease in all individuals.Unlike the case of the calibration test, correlating sequences need nothave been present in the discovery study but can be predicted based onthese sequences. For example, a correlating sequence can be TCR gene DNAsequence that encodes the same amino acid sequence as the DNA sequenceof a clonotype identified in the discovery study. Furthermore, thealgorithm that can predict one or more correlating clonotypes can beused to identify clonotypes in a sample from any individual and is in noway unique to a given individual, thus allowing the correlatingclonotypes to be predicted in a novel sample without prior knowledge ofthe clonotypes present in that individual.

In one aspect, a method for developing an algorithm that predicts one ormore correlating clonotypes in any sample from a subject with a diseaseis provided comprising: a) generating a plurality of clonotype profilesfrom a set of samples, wherein the samples are relevant to the disease,b) identifying one or more correlating clonotypes from the set ofsamples, c) using sequence parameters and/or functional data from one ormore correlating clonotypes identified in b) to develop an algorithmthat can predict correlating clonotypes in any sample from a subjectwith the disease.

In one embodiment, the set of samples are taken from one or more tissuesaffected by the disease.

In another embodiment, the identifying one or more correlatingclonotypes comprises comparing clonotype profiles from at least twosamples. In another embodiment, the functional data include bindingability of markers in T-cell and/or B-cells or interaction with antigenby a T-cell or B cell. In another embodiment, said sequence parameterscomprise nucleic acid sequence and predicted amino acid sequence. Inanother embodiment, the samples are from one or more individuals at apeak stage of the disease. In another embodiment, said one or morecorrelating clonotypes are present in the peak state of the disease. Inanother embodiment, said one or more correlating clonotypes are at ahigh level in the peak state of the disease. In another embodiment, oneor more correlating clonotypes are at a low level in the peak state ofthe disease. In another embodiment, one or more correlating clonotypesare absent at the peak state of the disease.

In one embodiment, the disease is an autoimmune disease. In anotherembodiment, the autoimmune disease is systemic lupus erythematosus,multiple sclerosis, rheumatoid arthritis, or Ankylosing Spondylitis.

In another aspect, a method for discovering one or more correlatingclonotypes for an individual is provided, comprising a) inputting aclonotype profile from a sample from the individual into an algorithm,and b) using the algorithm to determine one or more correlatingclonotypes for the individual. The algorithm can be an algorithmdeveloped by: a) generating a plurality of clonotype profiles from a setof samples, wherein the samples are relevant to the disease, b)identifying one or more correlating clonotypes from the set of samples,and c) using sequence parameters and/or functional data from one or morecorrelating clonotypes identified in b) to develop the algorithm thatcan predict correlating clonotypes in any sample from a subject with thedisease.

D. Discovering Correlating and Non Correlating Clonotypes Using aCalibration Test Combined with a Population Study

In one embodiment of the invention the correlating clonotypes areidentified by using a calibration test combined with a population study(e.g., see FIG. 17). In this embodiment the population study does notresult in an algorithm that allows clonotypes to be predicted in anysample but rather it allows an algorithm to be developed to predictcorrelating clonotypes in any sample from a subject for whom aparticular calibration clonotype profile has been generated (e.g., seeFIG. 16). An example of this could be the development of an algorithmthat would predict the correlating clonotypes in a lupus patient basedon the clonotype profile measured from a blood sample at any stage ofdisease after having first having had a blood test taken during aclinical flare state that was used to calibrate the algorithm.

In this embodiment the provided invention encompasses methods foridentifying correlating and non-correlating clonotypes by sequencing theimmune cell repertoire in a study of samples from patients of disease(s)and optionally healthy controls at different times and, in the case ofthe patients with a disease, at different (and known) states of thedisease course characterized by clinical data. The clonotypes that arefound at different frequency (or level) in the first state than in thesecond state are then used to develop an algorithm that predicts whichof the sequences found in the repertoires of each individual at thefirst disease state will correlate with disease at the later state ineach individual as distinct from those that will not correlate withdisease in that individual. Unlike the case of the calibration testalone, correlating sequences may be a subset of all the sequences foundto be different between disease states. It is also possible thatcorrelating clonotypes are not found in the calibration sample but arepredicted based on the algorithm to be correlating if they appear in afuture sample. As an example, a clonotype that codes for the same aminoacid sequence as a clonotype found in a calibration sample may bepredicted to be a correlating clonotype based on the algorithm thatresults from the population study. Unlike the previous embodiments, thealgorithm is developed to predict the correlating clonotypes based on acalibration clonotype profile which is a clonotype profile generated inthe individual for whom the correlating clonotypes are to be predictedwhich at a specific state of disease. In this embodiment the algorithmcannot be used to generate correlating clonotypes in a particularindividual until a specific calibration clonotype profile has beenmeasured. After this calibration profile has been measured in aparticular subject, all subsequent correlating clonotypes can bepredicted based on the measurement of the clonotype profiles in thatindividual.

In another aspect, a method for discovering one or more correlatingclonotypes for an individual is provided, comprising a) inputting aclonotype profile from a sample from the individual into an algorithm,and b) using the algorithm to determine one or more correlatingclonotypes for the individual. The algorithm can be an algorithmdeveloped by: a) generating a plurality of clonotype profiles from a setof samples, wherein the samples are relevant to the disease, b)identifying one or more correlating clonotypes from the set of samples,and c) using sequence parameters and/or functional data from one or morecorrelating clonotypes identified in b) to develop an algorithm that canpredict correlating clonotypes in any sample from a subject with thedisease. In one embodiment, the sample is at taken at a peak state ofdisease. In another embodiment, the sample is taken from diseaseaffected tissue.

E. Sequence Related Parameters that can be Used to Predict CorrelatingClonotypes

In order to conduct a population study a training set can be used tounderstand the characteristics of correlating clonotypes by testingvarious parameters that can distinguish those correlating clonotypesfrom those that do not. These parameters include the sequence or thespecific V, D, and J segments used. In one embodiment it is shown thatspecific V segments are more likely to correlate with some diseases asis the case if the clonotypes for a specific disease are likely torecognize related epitopes and hence may have sequence similarity. Otherparameters included in further embodiments include the extent ofhypersomatic mutation identified and the level of a clonotype at thepeak of an episode and its level when the disease is relativelyinactive. Other parameters that may predict correlating clonotypesinclude without limitation: 1) sequence motifs including V or J region,a combination VJ, short sequences in DJ region; 2) Sequence length ofthe clonotype; 3) Level of the clonotype including absolute level(number of clones per million molecules) or rank level; 4) Amino acidand nucleic acid sequence similarity to other clonotypes: the frequencyof other highly related clonotypes, including those with silent changes(nucleotide differences that code for same amino acids) or those withconservative amino acid changes; 5) For the BCRs the level of somaticmutations in the clonotype and/or the number of distinct clonotypes thatdiffer by somatic mutations from some germline clonotypes; 6) clonotypeswhose associated proteins have similar 3 dimensional structures.

F. Functional Data to Refine the Determination of Correlating Clonotypes

Further embodiments will make use of functional data to aid inidentifying correlating clonotypes. For example, T-cells and/or B-cellscontaining certain markers that are enriched in cells containingcorrelating clonotypes can be captured through standard methods likeFACS or MACS. In another embodiment the marker is a cell-surface marker.In another embodiment T-cells and/or B-cells reactivity to an antigenrelevant to the pathology or to affected tissue would be good evidenceof the pathological relevance of a clonotype.

In another embodiment the sequence of the candidate clonotypes can besynthesized and put in the context of the full TCR or BCR and assessedfor the relevant reactivity. Alternatively, the amplified fragments ofthe different sequences can be used as an input to phage, ribosome, orRNA display techniques. These techniques can select for the sequenceswith the relevant reactivity. The comparison of the sequencing resultsfor those before and after the selection can identify those clones thathave the reactivity and hence are likely to be pathological. In anotherembodiment, the specific display techniques (for example phage,ribosome, or RNA display) can be used in an array format. The individualmolecules (or amplifications of these individual molecules) carryingindividual sequences from the TCR or BCR (for example CDR3 sequences)can be arrayed either as phages, ribosomes, or RNA. Specific antigenscan then be studied to identify the sequence(s) that code for peptidesthat bind them. Peptides binding antigens relevant to the disease arelikely to be pathological.

G. Generating an Immune Load Algorithm

An algorithm can be used to compute an Immune Load (e.g., see FIG. 18).The Immune Load can be used to make a clinical decision. Using data froman experiment, (e.g., an experiment comprising samples from subjects ina first state of a disease and samples from subjects in a second stateof the disease), an algorithm can be developed that combines theinformation about the levels of the correlating and non-correlatingclonotypes into a single score (Immune Load). The parameters of thisalgorithm can then be adjusted to maximize the correlation betweenImmune Load and the clinical data. For example, the clinical data can bea clinical measure of disease severity (e.g., the extent of lesions onan MRI for a multiple sclerosis patient).

The correlating clonotypes used in generating an Immune Load algorithmcan be generated using a calibration test, a population study, or acalibration test and a population study as described above.

Some of the factors that can be considered in combining the correlatingclonotypes are the number of correlating clonotypes, their level, theirrate of change (velocity), and the rate of change in the velocity(acceleration). Other factors to be assessed include the level of theclonotypes at the episode peak and at the inactive disease state

In one embodiment, the Immune Load generated relates to an autoimmunedisease. Such a Load can be referred to as an AutoImm Load.

In one aspect, a method for generating an algorithm that calculates adisease activity score is provided, comprising:

a) developing an algorithm that uses a set of factors to combine levelsof correlating clonotypes into a disease activity score,b) comparing the disease activity score to clinical data regarding thedisease state, and c) optimizing the factors in order to maximize thecorrelation between clinical data and the disease activity score.

H. Monitoring Disease Using the Load Algorithm

1. Monitoring Disease without a Calibration Test

In one embodiment of the invention the clonotypes and the Immune Loadalgorithm are determined using a population study (e.g., see FIG. 19)Immune Load can be used directly without having to first calibrate theindividual patient. This test can be done when the patient is in anydisease state. This test can be used to generate specific correlatingand non-correlating clonotypes based on the algorithm developed above.Immune Load can then be calculated using the second algorithm generatedin a population study. This score can then be used clinically.

In another aspect, a method for monitoring the disease state of anindividual is provided comprising: a) determining a clonotype profilefrom a sample from a subject, b) inputting the clonotype profileinformation from a) into an algorithm, and c) using the algorithm togenerate a score predictive of the disease state of the individual. Thealgorithm can be an algorithm generated by a) developing an algorithmthat uses a set of factors to combine levels of correlating clonotypesinto a disease activity score, b) comparing the disease activity scoreto clinical data regarding the disease state, and c) optimizing thefactors in order to maximize the correlation between clinical data andthe disease activity score.

2. Monitoring Disease Using a Calibration Test

In one embodiment of the provided invention the correlating clonotypesand the Immune Load algorithm are determined using a calibration test orcalibration test and a population study (e.g., see FIG. 20). Immune Loadcan be used in the clinic by first conducting a calibration test. Thistest can be done when the patient is in a state which is similar to thefirst state used in the study that generated the correlating andnon-correlating clonotypes that are used in the Immune Load algorithm.For example, this state can be a flare state of an autoimmune disease ifthis is how the Immune Load algorithm was derived. This calibration testcan then be used to generate the specific correlating andnon-correlating clonotypes to be used in the subsequent diseasemonitoring tests. At a later point in the treatment of this patient,another test is done on the patient and Immune Load can be calculatedusing the algorithm generated in the discovery study, and the list ofclonotype levels generated in this patient's specific calibration test.This Immune Load score can then be used clinically.

In another aspect, a method for monitoring the disease state of anindividual is provided comprising: a) determining a clonotype profilefrom a sample from a subject, b) inputting the clonotype profileinformation from a) into an algorithm, and c) using the algorithm togenerate a score predictive of the disease state of the individual. Thealgorithm can be an algorithm generated a) developing an algorithm thatuses a set of factors to combine levels of correlating clonotypes into adisease activity score, b) comparing the disease activity score toclinical data regarding the disease state, and c) optimizing the factorsin order to maximize the correlation between clinical data and thedisease activity score. In another embodiment, the method can furthercomprise determining one or more correlating clonotypes in theindividual by any of the methods of the provided invention, andinputting information the one or more correlating clonotypes into thealgorithm.

In one embodiment, the disease is an autoimmune disease. In anotherembodiment, the autoimmune disease is systemic lupus erythematosus,multiple sclerosis, rheumatoid arthritis, or Ankylosing Spondylitis.

3. Other Factors Related to the Use of Immune Load

The same Immune Load may mean different things for different patients.For one, the full clinical picture of a patient needs to be considered.From a testing perspective, one may consider the velocity (rate ofchange of Immune Load over time) and acceleration (rate of change ofvelocity over time) in addition to the level of Immune Load in makingclinical decisions. For example if the AutoImm Load score is increasing(high velocity) it may be predictive of an incipient flare in anautoimmune disease.

Additional tests that can be integrated in the Load score, for example,an AutoImm Load score, include, for example, erythrocyte sedimentationrate (ESR), C-reactive protein (CRP) levels, Anti-ds DNA, otherautoantibody titers, complement levels, urine protein levels, Urineprotein/creatinine ratio, creatinine levels, blood urea nitrogen (BUN)levels, platelet levels, WBC counts, hematorcrit (Hct), Hb, urinalysisresults. Other tests that are related to SLE that can be integratedinclude, for example, CD27 level, CD27++ cell level, INF-responsivegenes (Baechler, E C et al. (2003) Proc. Natl. Acad. Sci. 100:2610-2615), and chemokine score (Bauer J W et al. (2009) ArthritisRheum. 60:3098-3107). Other tests not related to lupus include, forexample, thyroid-stimulating hormone (TSH) test, triiodothyronine (T3)test, thyroxine (T4) test, liver function tests (LFTs), otherautoantibodies, calprotectin test, lactoferrin test, and synovial fluidanalysis. The additional tests can include imaging test, including, forexample, MRI, CT-scan, X-ray, and ultrasound.

III. Determining Disease States

Because the immune system is so central to human health, the ability tomeasure immune responses has wide applications in medicine. Thisinvention teaches the ability to use the immune system to understandunderlying disease state when it is mediated by the immune system. Thisallows a very powerful set of diagnostic and prognostic applicationsthat use the immune profiles to inform the risks of wide variety ofclinical outcomes and allow physicians to intervene more effectively.

A. Utility of Immune Profiling in Autoimmune Disease Treatment

The methods of the provided invention can be used to diagnose and treatautoimmune disease in a subject. Autoimmune disease involves adaptiveimmune cells escaping the usual process conferring autoimmunity andattacking some target(s) on bodily tissue. Autoimmune diseases include,for example, acute disseminated encephalomyelitis, Addison's disease,ankylosing spondylitis, antiphospholipid antibody syndrome, autoimmunehemolytic anemia, autoimmune hepatitis, autoimmune inner ear disease,Behçet's disease, bullous pemphigoid, Celiac disease, Chagas disease,Chronic obstructive pulmonary disease, dermatomyositis, diabetesmellitus type 1, Goodpasture's syndrome, Graves' disease, Guillain-Barrésyndrome, Hashimoto's thyroditis, Hidradenitis suppurativa, Idiopathicthrombocytopenic purpura, Interstitial cystitis, multiple sclerosis,myasthenia gravis, neuromyotonia, pemphigus vulgaris, pernicious anemia,polymyositis, primary biliary cirrhosis, rheumatoid arthritis,scleroderma, systemic lupus erythematosus, Sjögren's syndrome, andvasculitis syndromes. The stages of these autoimmune diseases can bediagnosed using the methods of the provided invention. Treatments can besuggested to a subject based on the stage of the autoimmune disease.

Clinical information regarding a subject with an autoimmune disease, orsuspected of having an autoimmune disease, can be used to determine thedisease state (or AutoImm load). Clinical information can be used toidentify patterns of a clonotype profile that correlate with a diseasestate. Clinical information can include, for example, height, weight,eye color, age, gender, ethnic group, blood pressure, LDL cholesterollevels, HDL cholesterol levels, family medical history, and molecularmarker information.

Clinical information can include symptoms of one or more autoimmunediseases. For autoimmune hepatitis symptoms can include fatigue,hepatomegaly, jaundice, pruritus, skin rash, arthralgia, abdominaldiscomfort, spider angiomas, nausea, vomiting, anorexia, dark urine,pale or gray stools. For dermatoymyositis (DM), symptoms can includerash (patchy, bluish-purple discolorations on the face, neck, shoulders,upper chest, elbows, knees, knuckles and back) accompanying or precedingmuscle weakness, dysphagia, myalgia, fatigue, weight loss and low-gradefever. For Graves' disease, symptoms can include weight loss due toincreased energy expenditure, increased appetite, heart rate and bloodpressure, and tremors, nervousness and sweating. For Hashimoto'sthyroiditis, symptoms can include mental and physical slowing, greatersensitivity to cold, weight gain, coarsening of the skin, goiter. Formixed connective tissue disease (MCTD)), symptoms can include featuresof systemic lupus erythematosus (SLE), scleroderma and polymyositis. ForPemphigoid, bullous (BP) symptoms can include mildly pruritic welts tosevere blisters and infection, oral or esophageal bullae. For pemphigus,symptoms can include blistering of skin and mucous membranes. Forpernicious anemia, symptoms can include shortness of breath, fatigue,pallor, tachycardia, inappetence, diarrhea, tingling and numbness ofhands and feet, sore mouth and unsteady gait. For polymyositis (PM),symptoms can include muscle weakness, dysphagia and myalgia. For primarybiliary cirrhosis (PBC), symptoms can include fatigue and pruritus. Forscleroderma (systemic sclerosis), symptoms can include swelling andpuffiness of the fingers or hands, skin thickening, skin ulcers on thefingers, joint stiffness in the hands, pain, sore throat and diarrhea.For Sjögren's syndrome, symptoms can include dryness of the eyes andmouth, swollen neck glands, difficulty swallowing or talking, unusualtastes or smells, thirst and tongue ulcers. For systemic lupuserythematosus (SLE)), symptoms can include fever, weight loss, hairloss, mouth and nose sores, malaise, fatigue, seizures and symptoms ofmental illness, joint inflammation similar to RA, butterfly rash on noseand cheeks, extreme sensitivity to cold in the hands and feet. Forvasculitis syndromes, e.g., Wegener's granulomatosis, idiopathiccrescentic glomerulonephritis (ICGN), microscopic polyarteritis (MPA),pulmonary renal syndrome (PRS), symptoms can include fatigue, weakness,fever, arthralgia, abdominal pain, renal problems and neurologicalproblems. The clinical information can be from one or more subjects atone or more points of time.

The clinical information can include information regarding responses ofa subject with an autoimmune disease to one or more treatments thesubject has received.

The clinical utility of AutoImm Load is discussed for specificautoimmune diseases below. Another embodiment of this inventioncontemplates the combination of the immune profiling tests with othermarkers that are already in use for the detection of disease activity inthese diseases to allow tests with greater sensitivity and specificity.Other molecular identifiers or markers can be used in computing theAutoImm Load or for determining the disease state. Molecular identifierscan include nucleic acids, proteins, carbohydrates, and lipids, andexpression profiles of nucleic acids or proteins. The molecularidentifiers can be of human or non-human origin (e.g., bacterial). Theidentifiers or markers can be determined by techniques that include, forexample, comparative genomic hybridization (CGH), chromosomal microarrayanalysis (CMA), expression profiling, DNA microarray, high-densityoligonucleotide microarray, whole-genome RNA expression array, peptidemicroarray, enzyme-linked immunosorbent assay (ELISA), genomesequencing, copy number (CNV) analysis, small nucleotide polymorphism(SNP) analysis, immunohistochemistry, in-situ hybridization, fluorescentin-situ hybridization (FISH), PCR, Western blotting, Southern blotting,SDS-PAGE, gel electrophoresis, and Northern blotting.

For systemic lupus erythematosus, markers can include levels oferythrocyte sedimentation rate (ESR), C-reactive protein (CRP) levels,Anti-ds DNA, other autoantibody titers, complement levels, urine proteinlevels, Urine protein/creatinine ratio, creatinine levels, blood ureanitrogen (BUN) levels, platelet levels, WBC counts, hematocrit (Hct),Hb, and urinalysis results. Other tests that are related for instance toSLE that can be integrated include, for example, CD27 level, CD27++ celllevel, INF-responsive genes, and chemokine score.

1. Systemic Lupus Erythematosus (SLE)

The methods of the provided invention can be used to determine states orstages of systemic lupus erythemato sus (SLE or lupus). SLE is a seriousautoimmune condition that often afflicts young adults (mostly females).It is characterized by inflammatory processes that can affect manyorgans including the skin, joints, kidneys, lungs, heart, and centralnervous system leading to frequent disabilities and sometimes death. Thedisease follows a very unpredictable course marked by flare periodsfollowed by quiescent periods of remission. Nevertheless, patientsdiagnosed with SLE are seen regularly by a rheumatologist and treatedwith a variety of serious medications. These medications includesteroids such as Prednisone and other immunosuppressants such asCellcept (mycophenolate mofetil). While these drugs can reduce organdamage they contain significant side effects including risk of infectionand infertility. The unreliability for some of the symptoms (e.g., painand fatigue) and the unpredictable disease course makes tailoringmedication doses difficult, resulting in an overtreatment of somepatients and under-treatment of others. As a result, the treatment ofSLE poses significant therapeutic challenges to the clinician.

There are a number of standard methods a clinician can use to assess theactivity of SLE. The status of the disease can be measured by observingthe clinical symptoms of the disease. These methods include assessmentof signs (e.g., skin rash) and symptoms (e.g., joint pain and fatigue)as well as lab results (e.g., urine protein/creatinine ratio, anti-dsDNA antibody, and blood counts). These clinical markers, however, can belagging indicators of disease status and as such patients may respondonly after weeks or months of therapy. Furthermore, in some casessymptoms can be difficult to assess with precision (e.g., pain andfatigue). Other markers of inflammation, for example anti-ds DNAantibody, complement level (e.g., C3), C reactive protein (CRP), anderythrocyte sedimentation rate (ESR) usually lack specificity and/orsensitivity. Invasive methods such as kidney biopsy are impractical forroutine use. As a result clinicians perform quite a frequent testing oftheir patients without a perfect measure of the disease status. Theclinical symptoms and laboratory assessment are integrated in measuressuch as Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) andPhysician Global Assessment (PGA). These measures are not done routinelyin clinical practice and often fall short in several clinicalsituations.

Specific examples of the utility of AutoImm Load in making therapeuticinterventions in SLE are discussed in greater detail in the examplessection along with specific enabling studies that determine AutoImmLoad.

2. Multiple Sclerosis (MS)

The methods of the provided invention can also be used to determinestates or stages of Multiple Sclerosis (MS). MS is an autoimmune diseasethat affects the brain and spinal cord (central nervous system).Symptoms vary, because the location and severity of each attack can bedifferent. Episodes can last for days, weeks, or months. These episodesalternate with periods of reduced or no symptoms (remissions). It iscommon for the disease to return (relapse). However, the disease maycontinue to get worse without periods of remission.

Because nerves in any part of the brain or spinal cord may be damaged,patients with multiple sclerosis can have symptoms in many parts of thebody. Muscle symptoms include, for example, loss of balance, numbness orabnormal sensation in any area, pain because of muscle spasms, pain inthe arms or legs, problems moving arms or legs, problems walking,problems with coordination and making small movements, slurred ordifficult-to-understand speech, tremor in one or more arms or legs,uncontrollable spasm of muscle groups (muscle spasticity), and weaknessin one or more arms or legs.

Eye symptoms include, for example, double vision, eye discomfort,uncontrollable rapid eye movements, and vision loss (usually affects oneeye at a time).

Other brain and nerve symptoms include, for example, decreased attentionspan, decreased judgment, decreased memory, depression or feelings ofsadness, dizziness and balance problems, facial pain, hearing loss, andfatigue.

Bowel and bladder symptoms include, for example, constipation,difficulty beginning urinating, frequent need to urinate, stool leakage,strong urge to urinate, and urine leakage (incontinence).

There is no known cure for multiple sclerosis at this time. However,there are therapies that may slow the disease. The goal of treatment isto control symptoms and help the patient maintain a normal quality oflife.

Medications used to slow the progression of multiple sclerosis caninclude, for example, immune modulators to help control the immunesystem, including interferons (Avonex, Betaseron, or Rebif), monoclonalantibodies (Tysabri), glatiramer acetate (Copaxone), mitoxantrone(Novantrone), methotrexate, azathioprine (Imuran), cyclophosphamide(Cytoxan), and natalizumab (Tysabri). Steroids can be used to decreasethe severity of attacks.

Medications to control symptoms can include, for example, medicines toreduce muscle spasms such as Lioresal (Baclofen), tizanidine (Zanaflex),or a benzodiazepine, cholinergic medications to reduce urinary problems,antidepressants for mood or behavior symptoms, and amantadine forfatigue.

MS affects women more than men. The disorder most commonly beginsbetween ages 20 and 40, but can be seen at any age. MS is caused bydamage to the myelin sheath, the protective covering that surroundsnerve cells. When this nerve covering is damaged, nerve impulses areslowed down or stopped. MS is a progressive disease, meaning the nervedamage (neurodegeneration) gets worse over time. How quickly MS getsworse varies from person to person. The nerve damage is caused byinflammation. Inflammation occurs when the body's own immune cellsattack the nervous system. Repeated episodes of inflammation can occuralong any area of the brain and spinal cord. Researchers are not surewhat triggers the inflammation. The most common theories point to avirus or genetic defect, or a combination of both. MS is more likely tooccur in northern Europe, the northern United States, southernAustralia, and New Zealand than in other areas. Geographic studiesindicate there may be an environmental factor involved. People with afamily history of MS and those who live in a geographical area with ahigher incidence rate for MS have a higher risk of the disease.

Symptoms of MS may mimic those of many other nervous system disorders.The disease is diagnosed by ruling out other conditions. People who havea form of MS called relapsing-remitting may have a history of at leasttwo attacks, separated by a period of reduced or no symptoms. The healthcare provider may suspect MS if there are decreases in the function oftwo different parts of the central nervous system (such as abnormalreflexes) at two different times. A neurological exam may show reducednerve function in one area of the body, or spread over many parts of thebody.

Tests to diagnose multiple sclerosis include, for example, cerebrospinalfluid tests, including CSF oligoclonal banding, head MRI scan, lumbarpuncture (spinal tap), nerve function study (evoked potential test), andspine MRI.

Like other autoimmune diseases, MS follows an unpredictable course withacute flares and periods of remission. There are increasing numbers oftherapies, each with side effects that range from serious (weight gainand depression) to life threatening (pancytopenia and PML infections),variable effectiveness in different patients, and high costs. At thesame time, the lack of highly accurate and specific routine tests of MSdisease activity make the challenge of effectively administering therapycomplicated. Clinical episodes can be separated by long time periods (upto years in early stage disease) even without treatment. In addition,available medications reduce the likelihood of relapse but do notcompletely prevent them. Therefore disease activity is difficult toassess and thus, there is an inadequate short term measure of diseaseactivity that could be used to measure whether a specific therapy isshowing efficacy in a given patient by measuring the reduction in numberor severity of relapses. The only other test available for monitoringdisease activity is brain MRI to track the state of lesions as revealedwith the aid of contrast enhancing agents such as gadolinium. However,such imaging offers only an integrated view of brain damage and lacksspecificity and time resolution. Attempting to use MRI imaging to followdisease course on time scales shorter than a year is impractical giventhe costs, the lack of specificity and the dangers of excessive contrastexposure. As a result, patients are often treated at great expense forprolonged periods of time without any effective feedback that wouldallow the physician to modify dosing and/or switch of add therapies.

3. Rheumatoid Arthritis (RA)

The methods can be used to measure disease status for Rheumatoidarthritis patients. Rheumatoid arthritis (RA) is a chronic, systemicinflammatory disorder that can affect many tissues and organs butprincipally attacks the joints, producing an inflammatory synovitis thatoften progresses to destruction of the articular cartilage and ankylosisof the joints. Rheumatoid arthritis can also produce diffuseinflammation in the lungs, pericardium, pleura, and sclera, and alsonodular lesions, most common in subcutaneous tissue under the skin.Although the cause of rheumatoid arthritis is unknown, autoimmunityplays a pivotal role in its chronicity and progression.

About 1% of the world's population is afflicted by rheumatoid arthritis,women three times more often than men. Onset is most frequent betweenthe ages of 40 and 50, but people of any age can be affected. It can bea disabling and painful condition, which can lead to substantial loss offunctioning and mobility. RA is diagnosed chiefly on symptoms and signs,but can also be diagnosed with blood tests (especially a test calledrheumatoid factor) and X-rays. Diagnosis and long-term management aretypically performed by a rheumatologist, an expert in the diseases ofjoints and connective tissues.

Various treatments are available. Non-pharmacological treatment includesphysical therapy, orthoses, and occupational therapy. Analgesia(painkillers) and anti-inflammatory drugs, including steroids, can beused to suppress the symptoms, while disease-modifying antirheumaticdrugs (DMARDs) can be used to inhibit or halt the underlying immuneprocess and prevent long-term damage. In recent times, the newer groupof biologics has increased treatment options.

When RA is clinically suspected, immunological studies can be performed,such as testing for the presence of rheumatoid factor (RF, a specificantibody). A negative RF does not rule out RA; rather, the arthritis iscalled seronegative. This is the case in about 15% of patients. Duringthe first year of illness, rheumatoid factor is more likely to benegative with some individuals converting to seropositive status overtime. RF is also seen in other illnesses, for example Sjögren'ssyndrome, and in approximately 10% of the healthy population, thereforethe test is not very specific.

Because of this low specificity, new serological tests have beendeveloped, which test for the presence of so called anti-citrullinatedprotein antibodies (ACPAs). Like RF, these tests are positive in only aproportion (67%) of all RA cases, but are rarely positive if RA is notpresent, giving it a specificity of around 95%. As with RF, there isevidence for ACPAs being present in many cases even before onset ofclinical disease.

The most common tests for ACPAs are the anti-CCP (cyclic citrullinatedpeptide) test and the Anti-MCV assay (antibodies against mutatedcitrullinated Vimentin). Recently, a serological point-of-care test(POCT) for the early detection of RA has been developed. This assaycombines the detection of rheumatoid factor and anti-MCV for diagnosisof rheumatoid arthritis and shows a sensitivity of 72% and specificityof 99.7%.

Also, several other blood tests can be done to allow for other causes ofarthritis, such as lupus erythematosus. The erythrocyte sedimentationrate (ESR), C-reactive protein, full blood count, renal function, liverenzymes and other immunological tests (e.g., antinuclear antibody/ANA)are all performed at this stage. Elevated ferritin levels can revealhemochromatosis, a mimic RA, or be a sign of Still's disease, aseronegative, usually juvenile, variant of rheumatoid.

The term Disease modifying anti-rheumatic drug (DMARD) originally meanta drug that affects biological measures such as ESR and hemoglobin andautoantibody levels, but is now usually used to mean a drug that reducesthe rate of damage to bone and cartilage. DMARDs have been found both toproduce durable symptomatic remissions and to delay or halt progression.This is significant, as such damage is usually irreversible.Anti-inflammatories and analgesics improve pain and stiffness but do notprevent joint damage or slow the disease progression.

There is an increasing recognition among rheumatologists that permanentdamage to the joints occurs at a very early stage in the disease. In thepast it was common to start therapy with just an anti-inflammatory drug,and assess progression clinically and using X-rays. If there wasevidence that joint damage was starting to occur, then a more potentDMARD would be prescribed. Ultrasound and MRI are more sensitive methodsof imaging the joints and have demonstrated that joint damage occursmuch earlier and in more sufferers than was previously thought. Peoplewith normal X-rays will often have erosions detectable by ultrasoundthat X-ray could not demonstrate. The aim now is to treat before damageoccurs.

There may be other reasons why starting DMARDs early is beneficial topreventing structural joint damage. From the earliest stages of thedisease, the joints are infiltrated by cells of the immune system thatsignal to one another in ways that may involve a variety of positivefeedback loops (it has long been observed that a single corticosteroidinjection may abort synovitis in a particular joint for long periods).Interrupting this process as early as possible with an effective DMARD(such as methotrexate) appears to improve the outcome from the RA foryears afterwards. Delaying therapy for as little as a few months afterthe onset of symptoms can result in worse outcomes in the long term.There is therefore considerable interest in establishing the mosteffective therapy with early arthritis, when the patient is mostresponsive to therapy and have the most to gain.

Traditional small molecular mass drugs used to treat arthritis include,for example, chemically synthesized DMARDs: azathioprine, ciclosporin(cyclosporine A), D-penicillamine, gold salts, hydroxychloroquine,leflunomide, methotrexate (MTX), minocycline, and sulfasalazine (SSZ).Cytotoxic drugs include Cyclophosphamide.

The most common adverse events relate to liver and bone marrow toxicity(MTX, SSZ, leflunomide, azathioprine, gold compounds, D-penicillamine),renal toxicity (cyclosporine A, parenteral gold salts, D-penicillamine),pneumonitis (MTX), allergic skin reactions (gold compounds, SSZ),autoimmunity (D-penicillamine, SSZ, minocycline) and infections(azathioprine, cyclosporine A). Hydroxychloroquine may cause oculartoxicity, although this is rare, and because hydroxychloroquine does notaffect the bone marrow or liver it is often considered to be the DMARDwith the least toxicity. Unfortunately hydroxychloroquine is not verypotent, and is usually insufficient to control symptoms on its own.

Biological agents (biologics) can be produced through geneticengineering, and include, for example, tumor necrosis factor alpha(TNFα) blockers—etanercept (Enbrel), infliximab (Remicade), adalimumab(Humira), Interleukin 1 (IL-1) blockers—anakinra (Kineret), monoclonalantibodies against B cells—rituximab (Rituxan), T cell costimulationblocker—abatacept (Orencia), Interleukin 6 (IL-6) blockers—tocilizumab(an anti-IL-6 receptor antibody) (RoActemra, Actemra)

Anti-inflammatory agents include, for example, glucocorticoids,Non-steroidal anti-inflammatory drugs (NSAIDs, most also act asanalgesics). Analgesics include, for example, paracetamol (acetaminophenin US and Canada), opiates, diproqualone, and lidocaine topical.

The challenge of treating RA lies in the fact that the disease is a longterm chronic illness with that can result in challenging disability forwhich a large range of treatments exist each of which has significantdrawbacks. Many of the DMARDs subject the patients to significant sideeffects including increased risk for serious infections, cancer, or evenautoimmune disease. Furthermore, the biologically derived drugs are veryexpensive, and the patient can be subjected to frequent injections.

A doctor initiating therapy for a patient faces many possible options.It would be desirable to get rapid feedback once a patient startstherapy to understand whether the patient is responding to the therapythat is chosen before the clinical manifestation presents itself.Imaging is not sensitive and is expensive and many blood markers such asCRP lack sufficient sensitivity. A test that would allow the physicianto rapidly determine the state of the disease would allow him or her toadjust the therapy quickly to a more effective therapy, saving thepatient from additional joint damage and more effectively using theexpensive therapies available.

A patient that has not experienced any acute flares since beginningtreatment may in fact still be experiencing ongoing inflammatory damageto the joints that has not manifested itself clinically. A test thatwould allow the doctor to differentiate this state from the backgroundwould allow the therapy to be adjusted to try to bring the patientcloser to a state in which no ongoing joint damage is being experienced.

Specific examples of how AutoImm Load can be used in managing MSpatients are described in further detail in the examples section of thisdocument.

4. Ankylosing Spondylitis

The methods can be used to detect disease activity for Ankylosingspondylitis. Ankylosing spondylitis (AS, from Greek ankylos, bent;spondylos, vertebrae), previously known as Bechterew's disease,Bechterew syndrome, and Marie Strümpell disease, a form ofSpondyloarthritis, is a chronic, inflammatory arthritis and autoimmunedisease. It mainly affects joints in the spine and the sacroilium in thepelvis, causing eventual fusion of the spine. It is a member of thegroup of the spondyloarthropathies with a strong genetic predisposition.Complete fusion results in a complete rigidity of the spine, a conditionknown as bamboo spine.

The typical patient is a young male, aged 18-30, when symptoms of thedisease first appear, with chronic pain and stiffness in the lower partof the spine or sometimes the entire spine, often with pain referred toone or other buttock or the back of thigh from the sacroiliac joint. Menare affected more than women by a ratio about of 3:1, with the diseaseusually taking a more painful course in men than women. In 40% of cases,ankylosing spondylitis is associated with an inflammation of the eye(iridocyclitis and uveitis), causing redness, eye pain, vision loss,floaters and photophobia. Another common symptom is generalised fatigueand sometimes nausea. Less commonly aortitis, apical lung fibrosis andectasia of the sacral nerve root sheaths may occur. As with all theseronegative spondyloarthropathies, lifting of the nails (onycholysis)may occur

There is no direct test to diagnose AS. A clinical examination and X-raystudies of the spine, which show characteristic spinal changes andsacroiliitis, are the major diagnostic tools. A drawback of X-raydiagnosis is that signs and symptoms of AS have usually been establishedas long as 8-10 years prior to X-ray-evident changes occurring on aplain film X-ray, which means a delay of as long as 10 years beforeadequate therapies can be introduced. Options for earlier diagnosis aretomography and magnetic resonance imaging of the sacroiliac joints, butthe reliability of these tests is still unclear. The Schober's test is auseful clinical measure of flexion of the lumbar spine performed duringexamination.

During acute inflammatory periods, AS patients will sometimes show anincrease in the blood concentration of C-reactive protein (CRP) and anincrease in the erythrocyte sedimentation rate (ESR), but there are manywith AS whose CRP and ESR rates do not increase so normal CRP and ESRresults do not always correspond with the amount of inflammation aperson actually has. Sometimes people with AS have normal level results,yet are experiencing a significant amount of inflammation in theirbodies.

Ankylosing spondylitis (AS, from Greek ankylos, bent; spondylos,vertebrae), previously known as Bechterew's disease, Bechterew syndrome,and Marie Strümpell disease, a form of Spondyloarthritis, is a chronic,inflammatory arthritis and autoimmune disease. It mainly affects jointsin the spine and the sacroilium in the pelvis, causing eventual fusionof the spine.

It is a member of the group of the spondyloarthropathies with a stronggenetic predisposition. Complete fusion results in a complete rigidityof the spine, a condition known as bamboo spine.

There are three major types of medications used to treat ankylosingspondylitis: 1) Anti-inflammatory drugs, which include NSAIDs such asibuprofen, phenylbutazone, indomethacin, naproxen and COX-2 inhibitors,which reduce inflammation and pain Opioid analgesics have also beenproven by clinical evidence to be very effective in alleviating the typeof chronic pain commonly experienced by those suffering from AS,especially in time-release formulations. 2) DMARDs such as ciclosporin,methotrexate, sulfasalazine, and corticosteroids, used to reduce theimmune system response through immunosuppression; 3) TNFα blockers(antagonists) such as etanercept, infliximab and adalimumab (also knownas biologics), are indicated for the treatment of and are effectiveimmunosuppressants in AS as in other autoimmune diseases;

TNFα blockers have been shown to be the most promising treatment,slowing the progress of AS in the majority of clinical cases, helpingmany patients receive a significant reduction, though not elimination,of their inflammation and pain. They have also been shown to be highlyeffective in treating not only the arthritis of the joints but also thespinal arthritis associated with AS. A drawback, besides the often highcost, is the fact that these drugs increase the risk of infections. Forthis reason, the protocol for any of the TNF-α blockers include a testfor tuberculosis (like Mantoux or Heat) before starting treatment. Incase of recurrent infections, even recurrent sore throats, the therapymay be suspended because of the involved immunosuppression. Patientstaking the TNF medications are advised to limit their exposure to otherswho are or may be carrying a virus (such as a cold or influenza) or whomay have a bacterial or fungal infection.

AS affects produces symptoms that are very common in the healthypopulations. For example, a patient presenting complaining of severeback pain need not be experiencing an AS flare but rather might justhave routine back pain. The physician is forced to make a decision aboutwhether to treat these symptoms with expensive drugs with potentiallysevere side effects without a very precise view into the state of thedisease. CRP and ESR do not provide a very precise view of the diseasestatus. At the same time the course of the untreated disease can resultin debilitating long term spinal damage. This state of affairs leads toa difficult clinical challenge and significant overtreatment is used.The availability of an objective measure that reflects disease activitycan be of great help in the management of AS patients.

B. Utility of Immune Profiling in Cancer Detection

These methods can be used to measure cancer risk. Cancer has become theleading cause of death in the industrialized world. Therefore methods oftreatment of cancer are in great need. Many approaches for cancertreatment are being attempted including the development of new smallmolecule drugs as well as antibodies targeting the tumor.

One set of methods that has been proposed is immunotherapy. Tumorsurveillance is one of the functions of cells of the immune system.There are several categories of tumor antigens that are recognized bythe immune system. The first category is comprised of antigens that arenovel generated by somatic mutation (point mutation or a translocation)in the tumor. Another category consists of antigens from proteins thatare only expressed in male germ cells that do not express MHC molecules.The dysregulation of gene expression in many tumors may allow some ofthese antigens to be expressed. A third category includes antigens fromproteins only expressed in particular tissues. The fourth categorycomprises antigens that are significantly overexpressed in the tumortissue. Finally the fifth category includes antigens that result fromabnormal posttranslational modification.

One of the properties of tumors is their ability to escape effectiveelimination by the immune system. It is thought that new mutationsacquired in the tumor allow it to go from the equilibrium phase (wherethe tumor is not completely eliminated but its growth is held in check)to the escape phase where the tumor grows without effective control bythe immune system. There are many mechanisms that tumors employ toescape the immune system. These mechanisms include the lack of specificantigenic peptides, or the co-stimulatory molecules that can activate Tcells. Other mechanisms include the tumor secretion of factor thatinhibit T cells and the creation of a tumor-induced privileged site bycreating a physical barrier separating the tumor from lymphocytes.Inducing the immune system to better fight the tumor as a strategy fortreating cancer is being studied and tested in multiple ways. Oneapproach is the adoptive T cell therapy. This approach focuses onidentifying T cells that are targeting tumor antigens through isolationof cells that are infiltrating the tumor and/or reacting to a specifictumor antigen. These T cells can be grown in vitro in conditions thatenhance their effectiveness, like the use of IL-2 and/orantigen-presenting cells. The expanded cells are then infused back tothe patient blood. Another approach is to use of retrovirus containingtumor-specific TCR. These retrovirus can be infused in the patient inspecial cells that later secrete the retrovirus allowing it to infect Tcells that then start expressing the tumor-specific TCR. Finally acommon approach is the use of vaccination. The premise of this approachof therapy is that immunization of the patient with one or more of thetumor antigens will stimulate the immune system ability to fight thetumor. Immunization is often done with the use of an adjuvant likeBacille Calmette-Guerin (BCG). This approach has been successful inpreventing viral-induced cancer as evident by the ability to preventcervical cancers induced by HPV-16 and HPV-18. However this has beenless successful in the treatment of other tumors.

Much of the improvement in mortality because of cancer has come aboutdue to the availability of better early detection methods leading forinstance to reduced rates of mortality in breast cancer and cervicalcancers. The mutability of tumors makes their early treatment much moreeffective than when they are detected late. Traditionally. looking forcancer detection biomarkers usually involved looking for markers thatare highly expressed in the cancer and are at low level or absent in thenormal tissue. This has led to the identification of several tumormarkers, like PSA. One problem with early detection of in cancer is thatthe greatest value in for cancer detection occurs when detection ofbiomarker is most difficult, i.e., the tumor is very small. Therefore inorder to have an effective cancer detection biomarker that candistinguish patients with small tumors from those that do not, thereneeds to be a tremendous difference in expression between the tumor andthe normal tissue due to the large difference in size between the tumorand the normal tissue. Additionally the marker needs to “spill”efficiently to the blood or other body fluid to allow detection using anon-invasive technique.

This invention teaches a novel mechanism for cancer detection using theimmune cell response. In this view cancer detection is not achieved bythe detection of a marker produced by the tumor itself but by the immunesystem response to the tumor. Specifically the profile of TCR and/or BCRcan provide an insight on whether the body is mounting a response to atumor or not. This can ameliorate some of the issues with currentbiomarkers. First the immune response is an amplification signal thatcan be easier to detect. Second lymphocytes pass through the bloodregularly and hence the relevant biomarker may readily present anddetectable in peripheral blood than traditional tumor biomarker. Finallythe problem of “background” biomarker material generated by the normaltissue is greatly reduced. The great diversity of T and/or B cellsprovide a way to detect the relevant biomarker with high sensitivity andspecificity, particularly with the recent availability of highthroughput methods for DNA sequencing. The approach of using the immunesystem response to cancer to detect it leverages the foundations laid tothis field by the promise of immunotherapy. However the risk for the twoapplications is probably quite different. To use the immune response tocancer for its detection does not require that the specific clonotype beeffective in treating the tumor but rather that it is associated withthe immune response to the tumor.

Another embodiment of this invention contemplates the combination of theimmune profiling tests with other markers that are already in use forthe detection of cancer to allow tests with greater sensitivity andspecificity. Other molecular identifiers or markers can be used incomputing the Load algorithm or for determining the disease state.Molecular identifiers can include nucleic acids, proteins,carbohydrates, and lipids, and expression profiles of nucleic acids orproteins. The molecular identifiers can be of human or non-human origin(e.g., bacterial). The identifiers or markers can be determined bytechniques that include, for example, comparative genomic hybridization(CGH), chromosomal microarray analysis (CMA), expression profiling, DNAmicroarray, high-density oligonucleotide microarray, whole-genome RNAexpression array, peptide microarray, enzyme-linked immunosorbent assay(ELISA), genome sequencing, copy number (CNV) analysis, small nucleotidepolymorphism (SNP) analysis, immunohistochemistry, in-situhybridization, fluorescent in-situ hybridization (FISH), PCR, Westernblotting, Southern blotting, SDS-PAGE, gel electrophoresis, and Northernblotting.

C. Utility of Immune Profiling in Transplant Medicine

These methods can be used to detect immune rejection of transplantedorgans. Transplantation of organs have become an integral part ofmedicine with over 25,000 solid organ (kidney, liver, heart, pancreas,and lung) transplants and more than 15,000 bone marrow transplantsoccurring in the US per year. These are generally complicated proceduresdone at tertiary care centers. To minimize the risk of transplantrejection, patients are often placed on immunosuppression for extendedperiods of time subjecting them to the risk of cancer and infections.Furthermore many transplants are rejected either acutely or years afterthe transplantation. In spite of these issues organ transplant remainsan essential treatment modality as patients with organ failures have fewother alternatives.

Solid organ transplant rejection primarily occurs due to response of theadaptive immune system to the transplanted organ. This is due to thepresence of alloantigens in the graft that are recognized by the host'simmune system, The rejection can occur in three different phases. Thefirst is the hyperacute phase within minutes of the transplant wherepreformed antibodies mount a response to the graft. The second is theacute rejection that occurs in first weeks or months after thetransplant. The last is chronic rejection that can occur years after thetransplantation. Given these risks care has been taken to minimize theimmunogenic differences between the donor and recipient. For example therisk of the hyperacute reaction is greatly reduced when the donor andrecipient are matched for their ABO subtypes as well as tested for crossmatching (determining whether the recipient has antibodies that reactwith the leukocytes of the donor). Similarly careful matching for theMajor HistoCompatability (MHC) is done to reduce acute rejection.However given that MHC molecules are very polymorphic it is very hard tofind to identify a perfect match. Monozygotic twins have a perfect MHCmatching. Similarly ¼ siblings are expected to have a perfect MHC match.Unrelated individuals that have the same detected alleles per theclinical test often have differences due to other polymorphic sites thatare not tested in routine clinical practice. However even with perfectMHC matching from siblings, there is still a significant risk ofrejection due to the existence of minor histocompatibility antigens, andindeed acute rejection is very common occurring to more than half of thegrafts.

One might imagine that more aggressive testing of the MHC locus as wellas identification and matching the minor histocompatibility antigenswould significantly improve the graft rejection and possibly survivalrates. While that might be true the limited numbers of available donororgans available makes this task impractical as more aggressive testingmay significantly delay the identification of an appropriate graft to beused for each patient. Therefore, much of the progress that has occurredin the transplantation field was in the use of immunosuppressive agentsto prevent and treat rejection. Currently many drugs are utilized forthis purpose including: Azathioprine, corticosteroids, Cyclosporine,Tacrolimus, Mycophenolate Acid, Sirolimus, Muromonab-CD3, MonoclonalAnti-CD25 Antibody, Monoclonal Anti-CD20 Antibody, and Calcineurininhibitors.

Bone marrow transplant is most frequently used for leukemia and lymphomatreatment. Typically the recipient undergoes an aggressive regimen ofradiation and/or chemotherapy to decrease the load of the tumor beforethe transplantation. Mature T cells from the donor can attack some ofthe host tissues in the inverse rejection that is called Graft Vs HostDisease (GVHD). This is often manifested by rash, diarrhea, and liverdisease. Careful matching of MHC can ameliorate but not eliminate thisproblem. One solution is the depletion of the donor bone marrow in vitroof mature T cells that are ultimately responsible for GVHD. One problemwith this is that the same phenomenon that causes GVHD may beresponsible for some of the therapeutic effect of bone marrow transplantthrough the graft vs. leukemia effect where donor T cells attack theremaining cancer cells. In addition depletion of donor T cells canexpose to patient to the risk of being immunodeficient. Therefore therisk and benefits have to be balanced when considering these approaches.Patients are therefore often treated with immunosuppressants to preventas well as treat GVHD.

Current management of bone marrow but even more so for solid organtransplantation rely heavily on the treatment with strongimmunosuppressive agents. However given that these drugs havesignificant risks they are used in a manner to balance risk and benefit.However given that the risk for a specific patient at a particular timeis not well understood patients are treated with the dose where risk andbenefits are balanced for the average patient. Tests that can predictfuture rejection events may potentially be very helpful in tailoringtreatment to the patients at the appropriate times they need them. Thismay result in reduction in the immunosuppressive doses or some of thepatients while improving the rate of rejection and hopefully graftsurvival.

Another embodiment of this invention contemplates the combination of theimmune profiling tests with other markers that are already in use forthe detection of transplant rejection to allow tests with greatersensitivity and specificity. Other molecular identifiers or markers canbe used in computing the Load algorithm or for determining the diseasestate. Molecular identifiers can include nucleic acids, proteins,carbohydrates, and lipids, and expression profiles of nucleic acids orproteins. The molecular identifiers can be of human or non-human origin(e.g., bacterial). The identifiers or markers can be determined bytechniques that include, for example, comparative genomic hybridization(CGH), chromosomal microarray analysis (CMA), expression profiling, DNAmicroarray, high-density oligonucleotide microarray, whole-genome RNAexpression array, peptide microarray, enzyme-linked immunosorbent assay(ELISA), genome sequencing, copy number (CNV) analysis, small nucleotidepolymorphism (SNP) analysis, immunohistochemistry, in-situhybridization, fluorescent in-situ hybridization (FISH), PCR, Westernblotting, Southern blotting, SDS-PAGE, gel electrophoresis, and Northernblotting.

D. Utility of Immune Profiling in the Treatment of Infection

These methods have utility in guiding the treatment of infectionsparticularly when these infections can exist in active and latentstates. The advent of antibiotics for the treatment of infectiousdisease over the past century has made a great impact on lifeexpectancy. Over the past decade molecular diagnostics techniques havetaken a rapidly increasing role in the diagnosis and management ofinfectious disease. The excellent sensitivity and specificity providedby nucleic acid amplification has enabled the application of thesetechniques to an increasing number of applications. Many of theapplications are used for the diagnostic evaluation of the presence orabsence of infectious agents. For example the testing of sexuallytransmitted diseases is often done by a molecular testing employingnucleic acid amplification technique. Another set of application involvethe assessment of the “load” of the infection in a patient with analready diagnosed infectious agent. An example of that is the assessmentof HIV viral load in patients already diagnosed with AIDS. This testhelps the physician in determining whether the state of the patient'sdisease and hence can provide guidance on the effectiveness of thetreatment regimen being used.

It is sometimes helpful not only to consider the level of the infectiousagent but also the immune response to the infectious agent. One examplewhere the immune response to the infection is used routinely in clinicalpractice is in hepatitis B. One aspect of hepatitis B testing relies ondetecting the infectious agent through detection of hepatitis B antigensof by a nucleic acid amplification assay. In addition it is common inroutine clinical practice to test for the presence of differentantibodies that target the hepatitis B virus. The presence of anti-HBcIgM usually occurs in an acute infection setting, the appearance ofanti-HBc IgG indicates the infection is chronic. Similarly the emergenceof anti-HBs antibody signals clearing of the infection.

In one embodiment of this invention the value of the assessing theimmune response to an infection is harnessed along with the sensitivityand specificity of the molecular testing. This can be particularlyuseful for infectious diseases that are chronic where the infectiousagent remains latent in the body. The profile of the TCR and/or BCR canbe used to assess the immune response to an infection. Sequencing can beused to obtain a profile of the TCR and/or BCR allowing the detection ofparticular clonotypes with high sensitivity and specificity. Todetermine the specific clonotypes that correlate with disease severalapproaches are conceived.

Another embodiment of this invention contemplates the combination of theimmune profiling tests with other markers that are already in use forthe detection of infectious agents to allow tests with greatersensitivity and specificity. Other molecular identifiers or markers canbe used in computing the Load algorithm or for determining the diseasestate. Molecular identifiers can include nucleic acids, proteins,carbohydrates, and lipids, and expression profiles of nucleic acids orproteins. The molecular identifiers can be of human or non-human origin(e.g., bacterial). The identifiers or markers can be determined bytechniques that include, for example, comparative genomic hybridization(CGH), chromosomal microarray analysis (CMA), expression profiling, DNAmicroarray, high-density oligonucleotide microarray, whole-genome RNAexpression array, peptide microarray, enzyme-linked immunosorbent assay(ELISA), genome sequencing, copy number (CNV) analysis, small nucleotidepolymorphism (SNP) analysis, immunohistochemistry, in-situhybridization, fluorescent in-situ hybridization (FISH), PCR, Westernblotting, Southern blotting, SDS-PAGE, gel electrophoresis, and Northernblotting.

E. Utility of Immune Profiling in the Treatment of Aging Patients

These methods have utility in monitoring the state of the immune systemin the aged. Older people suffer from a decline in the immune systemcalled immunosenescence that affects their ability to respond toinfections and to raise effective responses to vaccines (Weinberger etal., 2008). This is apparent from the high mortality rates due topneumonia in the elderly (Office for National Statistics, 2005), andtheir susceptibility to hospital-acquired infections, such asClostridium difficile and methicillin-resistant Staphylococcus aureus(Health Protection Agency, 2008). Furthermore the decline in the immunesystem ability is thought to explain the increased rate of cancers inthe elderly. In addition immunosenescence may contribute to other majordiseases of the elderly with significant component of inflammatoryprocesses, like Alzheimer and heart disease. An ability to predict whichindividuals are most at risk for these deadly outcomes would be usefulto geriatrics physicians as they make clinical decisions aboutvaccination, aggressive treatment of infections and hospitalization.

Many aspects of the innate and adaptive immune system are altered inimmunosenescence. T cells lose responsiveness, macrophages have adecreased antigen-presenting capacity and altered cytokine secretion,natural killer cells have reduced toxicity, follicular dendritic cellscannot present antigen as efficiently, and neutrophils lose phagocyticability. There is smaller pool of naïve T and B cells and an increase inthe memory and effector pool leading to a reduced diversity of T and Bcell repertoires leading to the reduction of the ability of the adaptiveimmune system to respond to new antigens. In particular T cellrepertoires that are associated with cytomegalovirus (CMV) are greatlyincreased and as much as 45% of the total T cell repertoire may bedevoted to it. It has been noted that these expansions are lesspronounced in centenarians.

Studies have suggested that immune markers can predict survival in theelderly. The degree of diversity of the B cell repertoire has been shownto predict survival in the elderly at least in one population. Eventhough these global differences in TCR and BCR diversity were shown topredict clinical outcomes but these markers lack specificity. Deeperanalysis of the repertoire data may provide significantly moreprediction accuracy. For example, expansions responsive to CMV may havea different significance than other expansions.

In one embodiment of this invention, RNA from the T and B cells found inperipheral blood can be collected from a longitudinal cohort of agingpatients whose clinical histories are followed for several years. TheTCRα and TCRβ genes and the IgH, IgK and IgL genes can be amplified ineach of these cohorts at several time points in their clinicalhistories. Profiles of patients with long survival will be compared topatients with short survival. First, global measure of diversity can beobtained. This will include not only the number of different clonotypesidentified but also their diversity. For example, is the V, D, J segmentusage the same in the two groups or is one group more restricted in itsusage? For example two samples may have the same number of independentclonotype but the clonotypes for one of the two samples do not covermany of the V segments. It is logical to expect that this sample wouldbe less versatile in responding to a new antigen compared with the othersample whose clonotypes are distributed among all the V segments.

In addition to global diversity it will be determined whether expandedclonotypes in patients who had a long survival can be distinguished onthe basis of some sequence parameter compared to clonotypes in patientswho had a short survival. This approach can be supplemented by lookingat clonotypes that respond to specific antigens. For example given theavailable evidence identification of CMV responsive clonotypes can havepredictive power. Capturing T cells clonotypes that are CMV reactive ina discovery study can be done from a set of elderly as well as healthypatients. Sequences of these clonotypes can be studied to identifyparameters that distinguish them from other clonotypes. Using thispredictive algorithm of CMV clonotypes with the longitudinal cohortdescribed above it can be assessed whether adding this information canadd to the ability to predict the patient who survive for a long timefrom that who does not.

Another embodiment of this invention contemplates the combination of theimmune profiling tests with other markers that are already in use forthe detection of health in the aging population to allow tests withgreater sensitivity and specificity. Other molecular identifiers ormarkers can be used in computing the Load algorithm or for determiningthe disease state. Molecular identifiers can include nucleic acids,proteins, carbohydrates, and lipids, and expression profiles of nucleicacids or proteins. The molecular identifiers can be of human ornon-human origin (e.g., bacterial). The identifiers or markers can bedetermined by techniques that include, for example, comparative genomichybridization (CGH), chromosomal microarray analysis (CMA), expressionprofiling, DNA microarray, high-density oligonucleotide microarray,whole-genome RNA expression array, peptide microarray, enzyme-linkedimmunosorbent assay (ELISA), genome sequencing, copy number (CNV)analysis, small nucleotide polymorphism (SNP) analysis,immunohistochemistry, in-situ hybridization, fluorescent in-situhybridization (FISH), PCR, Western blotting, Southern blotting,SDS-PAGE, gel electrophoresis, and Northern blotting.

F. Utility of Immune Profiling in the Administration of Vaccines

These methods have utility in the administration of vaccines. The use ofvaccination has led to a great reduction in the rate of infections ofmultiple organisms. One infectious disease that continues to have asignificant health impact with over 30,000 deaths a year in the US isInfluenza. Influenza vaccination has to be done yearly as the strainmutates rapidly. Most of the severe sequelae of the disease occur in theelderly. Unfortunately the elderly often experience immunosenescencerendering them inadequately responsive to the vaccination.

In order to distinguish patients who are responsive to vaccination fromthose that are not, a discovery study needs to be performed. In thispopulation pre and (at one or more set time) post vaccination bloodsamples are available for a cohort of Influenza vaccinated patients withknown Influenza outcome (i.e., were they later protected from theinfection or not). TCR and/or BCR sequence can be obtained from thesesamples. Clonotypes that are enriched after vaccination in each patientare determined. Enriched clonotypes in patients who responded to thevaccination are then compared to a control set of clonotypes (e.g., therest of the clonotypes in the same set of patients) to distinguish thecorrelating clonotypes from other clonotypes. The algorithm to predictthese clonotypes is then used to predict correlating clonotypes amongpatients who did not respond to the vaccination. Patients who did notrespond may generate the same type of clonotypes as those that respondedbut at lower levels. Alternatively it might be that non-respondersgenerate a distinct class of clonotypes. The number of correlatingclonotypes identified in the non-responder may distinguish these twopossibilities.

With the correlating clonotypes identified, an algorithm is then builtto generate a score for predicting likelihood of immunization. Data fromthe profiles of the vaccine-responders and those that do not respond areutilized to generate this algorithm. This algorithm can then be used topredict the likelihood of immunization in the next patient using thepredicted correlating clonotypes from a sample obtained afterimmunization. The prediction is done through the application of anotheralgorithm that has also been generated in the discovery study. It canoptionally be aided (or substituted) by data from the pre-calibration tolimit the search for correlating clonotypes to those that were enrichedafter immunization.

Another embodiment of this invention contemplates the combination of theimmune profiling tests with other markers that are already in use forthe detection of response to vaccination to allow tests with greatersensitivity and specificity. Other molecular identifiers or markers canbe used in computing the Load algorithm or for determining the diseasestate. Molecular identifiers can include nucleic acids, proteins,carbohydrates, and lipids, and expression pro files of nucleic acids orproteins. The molecular identifiers can be of human or non-human origin(e.g., bacterial). The identifiers or markers can be determined bytechniques that include, for example, comparative genomic hybridization(CGH), chromosomal microarray analysis (CMA), expression profiling, DNAmicroarray, high-density oligonucleotide microarray, whole-genome RNAexpression array, peptide microarray, enzyme-linked immunosorbent assay(ELISA), genome sequencing, copy number (CNV) analysis, small nucleotidepolymorphism (SNP) analysis, immunohistochemistry, in-situhybridization, fluorescent in-situ hybridization (FISH), PCR, Westernblotting, Southern blotting, SDS-PAGE, gel electrophoresis, and Northernblotting.

G. Utility of Immune Profiling in the Monitoring of ImmuneHypersensitivity (Allergy)

The adaptive immune system has evolved to respond to antigens that areassociated with pathogens. As in the case of autoimmune diseases, theimmune system can sometimes have the wrong target. Whereas in autoimmunediseases the immune system targets self antigen, in hypersensitivityreactions it mounts a response to harmless stimuli like medications,dust, and food. Hypersensitivity is very common with as many as 50% ofthe US population having allergy to an environmental stimulus, and it iscaused by mechanisms. Hypersensitivity is divided into 4 types. Type Ihypersensitivity is the immediate type hypersensitivity and is mediatedby IgE. Type II is often due to IgG antibody binding to cellsurface-associated antigen. For example a harmless drug that binds tothe surface of the cell can make the cell a target for anti-drug IgG inpatients who happened to have these antibodies. Type III is caused bydeposition of antigen-antibody complexes on tissues. This occurs forexample when the amount of antigen is large resulting in small immunecomplexes that can't be cleared efficiently and are instead deposited onblood vessel walls. Type IV sensitivity is a delayed typehypersensitivity mediated by T cells. Type I and type IV have thehighest impact on human health.

In Type I hypersensitivity reaction the patient becomes sensitized to aharmless antigen (allergen) by producing IgE antibody against it. Laterexposure to the allergen induces the activation of IgE-binding cells,such as mast cells and basophils. Once activated these cells cause theallergic reaction through inducing an inflammatory process by secretingstored chemicals and synthesizing cytokines, leukotrienes, andprostaglandins. The dose and the route of entry of the allergendetermines the magnitude of the allergic reaction that can range fromsymptoms of allergic rhinitis to the life-threatening circulatorycollapse in anaphylaxis. Often the acute Type I reaction is laterfollowed by another late phase that is plays a role in many of theresulting pathological processes. The late phase of recruitment of Thelper cells and other inflammatory cells is essentially a Type IVhypersensitivity reaction. Some Type I allergic reactions includeseasonal rhinoconjunctivitis (hayfever), food allergy, drug-inducedanaphylaxis, atopic dermatitis (eczema), and asthma. These are verycommon conditions with rising prevalence causing significant costs aswell as morbidity and mortality. For example, Asthma is a chronicdisease that inflicts ˜7% of the US population causing ˜4,000 deaths ayear. Some of these diseases have some related aspects. For example,patients with atopic dermatitis are at significantly increased risk tohave asthma. Food allergy can cause vomiting and diarrhea but can alsoresult in anaphylaxis in a significant number of patients—30,000 casesresulting in ˜200 deaths per year in the US. Some of the same allergensthat activate submucosal mast cells in the nose causing symptoms ofallergic rhinitis can also activate mast cells in the lower airwayscausing bronchial constriction, a typical symptom of asthma. Some TypeIV hypersensitivity reactions are contact dermatitis (e.g., poison ivy),chronic rhinitis, chronic asthma, and celiac disease. Celiac disease isa chronic disease caused by a non-IgE mediated food allergy. It is adisease of the small intestine caused by the allergic response againstgluten, a component present in wheat and other foods. Over 95% ofpatients celiac patients have a specific MHC class II allele, theHLA-DQ2.

Treatment of hypersensitivity reactions differs, but they often had twoaspects: the acute treatment and chronic management or prevention. Someof these conditions can be life threatening (anaphylaxis, and acuteasthma) and involve immediate medical attention. The chronic managementin general it involves trying to avoid the specific allergen. This maybe effective when the allergen can be clearly identified (e.g., allergyto nuts), but this can be difficult when the allergen is present widelyin the environment, like pollen or dust. Therefore chronic treatmentwith medications is often utilized for some of these diseases (e.g.,asthma and allergic rhinitis). The level of effectiveness of thetreatment management is ultimately tested when the patient is re-exposedto the allergen(s). Therefore some patients may be subject to over- orunder-treatment. Ideally a test that assesses the disease activity andthe degree to which the patient is prone to mount a hypersensitivityresponse would be available. Such a test would allow the tailoring oftreatment to the individual patient needs.

EXAMPLES Example 1 Determining the Sequence of Recombined DNA in aSubject with an Autoimmune Disease

A blood sample is taken from a patient with an autoimmune disease. CD4+and CD8+ cells are isolated from the blood sample using antibody-coatedmagnetic beads. PCR is used to amplify the full variable region of the Tcell receptor β gene. The amplified fragments are subcloned into vectorsand transformed in bacteria to isolate the DNA fragments. The bacteriaare grown to amplify the DNA, and dideoxy sequencing is used to sequencethe variable regions of the T cell receptor β gene to identify theclonotypes. The sequencing information is used to generate a clonotypeprofile for the patient. A similar method is shown in FIG. 1.

Example 2 Determining the State of an Autoimmune Disease

A sample of cerebral spinal fluid (CSF) and blood is taken from apatient with an episode peak of multiple sclerosis. CD4+ cells areisolated from the CSF and blood, and the CDR3 of the T cell receptor βgene is amplified by PCR. The amplified fragments are subcloned intovectors and transformed in bacteria to isolate the DNA fragments. Thebacteria are grown to amplify the DNA, and dideoxy sequencing is used tosequence the variable regions of the T cell receptor β gene to identifythe clonotypes. The sequencing information is used to generate aclonotype profile for the patient.

Another blood sample is taken when the patient is at a relativelyinactive state of multiple sclerosis. The same procedure as above isrepeated to generate a clonotype profile. Pathological clonotypes areidentified as those that are high at the peak episode and went downsignificantly at the inactive state. Another blood sample is taken fromthe patient at a later state. At this time only a fraction of the T cellreceptor β gene CDR3 regions are amplified and then sequenced. Thissubset contains the pathological clonotypes. The level of the variousclonotypes is determined to assess the disease state of the patient.

Example 3 TCRβ Repertoire Analysis Amplification and Sequencing Strategy

To study amplification of the TCR repertoire, TCRβ chains will beanalyzed. The analysis will include amplification, sequencing, andanalyzing the TCRβ sequences. One primer AGCGACCTCGGGTGGGAACA iscomplementary to a common sequence in Cβ1 and Cβ2, and there are 34 Vprimers (Table 1) capable of amplifying all 48 V segments. Cβ1 or Cβ2differ from each other at position 10 and 14 from the J/C junction. Theprimer for Cβ1 and Cβ2 will end at position 16 bp and should have nopreference for Cβ1 or Cβ2.

The 34 V primers are modified from an original set of primers publishedby the BIOMED-2 group in order to amplify all 48 V segments and alltheir published alleles as defined by the international ImMunoGeneTicsinformation system (http://imgt.cines.fr/).

The BIOMED-2 primers have been used in multiplex in order to identifyclonality in lymphoproliferative diseases.

TABLE 1  Primer sequences complementary to the different V families.V segment family Primer sequence V20-1 AACTATGTTTTGGTATCGTCAGT V29-1TTCTGGTACCGTCAGCAAC V9, 5-1, 5-6, 5-5, 5-8, 5-4A AGTGTATCCTGGTACCAACAGV9, 5-1, 5-6, 5-5, 5-8, 5-4B AGTGTGTACTGGTACCAACAGV9, 5-1, 5-6, 5-5, 5-8, 5-4C ACTGTGTCCTGGTACCAACAGV9, 5-1, 5-6, 5-5, 5-8, 5-4D AGTGTGTCCTGGTACCAACAGV9, 5-1, 5-6, 5-5, 5-8, 5-4E TCTGTGTACTGGTACCAACAGV7-3, 7-6, 7-9, 7-2, 7-4A CCCTTTACTGGTACCGACAG V7-3, 7-6, 7-9, 7-2, 7-4BGCCTTTACTGGTACCGACAG V7-3, 7-6, 7-9, 7-2, 7-4C CCCTTTACTGGTACCGACAAAV7-8, 16A TTTTGGTACCAACAGGTCC V7-8, 16B TTTTGGTACCAACAGGCCC V 7-7AACCCTTTATTGGTATCAACAG V4-1, 4-3, 4-2A CGCTATGTATTGGTACAAGCAV4-1, 4-3, 4-2B GGCAATGTATTGGTACAAGCA V12-3, 12-4, 12-5TTTCTGGTACAGACAGACCATGA V3-1 TACTATGTATTGGTATAAACAGGACTC V25-1CAAAATGTACTGGTATCAACAA V28, 10-3, 6-2, 6-3, 6-1, 6-6, 24-1AATGTTCTGGTATCGACAAGACC V28, 10-3, 6-2, 6-3, 6-1, 6-6, 24-1BATGTACTGGTATCGACAAGACC V6-4, 6-9A TGCCATGTACTGGTATAGACAAG V6-4, 6-9BATACTTGTCCTGGTATCGACAAG V10-1, 10-2, 6-5, 6-9, 6-8, 27AATATGTTCTGGTATCGACAAGA V10-1, 10-2, 6-5, 6-9, 6-8, 27BATATGTCCTGGTATCGACAAGA V10-1, 10-2, 6-5, 6-9, 6-8, 27CACATGTCCTGGTATCGACAAGA V14 TAATCTTTATTGGTATCGACGTGT V19GCCATGTACTGGTACCGACA V18 TCATGTTTACTGGTATCGGCAG V30CAACCTATACTGGTACCGACA V11-1,11-3, 11-2A CATGCTACCCTTTACTGGTACCV11-1,11-3, 11-2B CACAATACCCTTTACTGGTACC V2 ATACTTCTATTGGTACAGACAAATCTV13 CACTGTCTACTGGTACCAGCA V15 CGTCATGTACTGGTACCAGCA

The use of the primers for amplification was tested with 34 syntheticsequences. The synthetic sequences contained on one side the sequence ofone of the oligonucleotides and on the other side the complement of theC primer. In between the two primers was 6 bp corresponding to therestriction enzyme site Cla I. All the synthetic sequences wereamplified with the appropriate primers, and it was demonstrated throughCla I digestion that the amplification products were the result ofamplifying the synthetic sequences and not through formation of primerdimers.

The Illumina Genome Analyzer is the sequencing platform of choice. Ineach lane, ˜15 million reads can be done. Twelve human and 96 mousesamples per lane will be run, and sequence tags will be used todistinguish reads of one sample from those of another. A two-stageamplification screen can be performed, as illustrated in FIG. 2. Asshown in FIG. 2A, the primary PCR will use on one side a 20 bp primerwhose 3′ end is 16 bases from the J/C junction and is perfectlycomplementary to Cβ1 and the two alleles of Cβ2. In the secondary PCR,on the same side of the template, a primer is used that has at its 3′endthe sequence of the 10 bases closest to the J/C junction, followed by 17bp with the sequence of positions 15-31 from the J/C junction, followedby the P5 sequence. This primer is referred to as C10-17-P5. P5 plays arole in cluster formation. When the C10-17-P5 primer anneals to thetemplate generated from the first PCR, a 4 bp loop (position 11-14) iscreated in the template, as the primer hybridizes to the sequence of the10 bases closest to the J/C junction and bases at positions 15-31 fromthe J/C junction. The looping of positions 11-14 eliminates differentialamplification of templates carrying Cβ1 or Cβ2. Ultimately, sequencingis done with a primer complementary to the sequence of the 10 basesclosest to the J/C junction and bases at positions 15-31 from the J/Cjunction (this primer will be called C′). C10-17-P5 primer can be HPLCpurified in order to ensure that all the amplified material has intactends that can be efficiently utilized in the cluster formation.

In FIG. 2B, the length of the overhang on the V primers is shown to be14 bp. The first PCR may be helped with shorter overhang. On the otherhand for the sake of the second PCR, it can be advantageous to have theoverhang in the V primer used in the first PCR as long as possiblebecause the second PCR will be priming from this sequence. A veryinefficient priming in the second PCR may cause limitation in therepresentation in the final data.

A minimum size of the overhang that supports an efficient second PCR wasinvestigated. Two series of V primers (for two different V segments)with overhang sizes from 10 to 30 with 2 bp steps were made. Using theappropriate synthetic sequences, the first PCR was performed with eachof the primers in the series and gel electrophoresis was performed toshow that all amplified. In order to measure the efficiency of thesecond PCR amplification SYBR green real time PCR was performed using asa template the PCR products from the different first PCR reactions andas primers Read2-tag1-P7 and Read2-tag2-P7. A consistent picture emergedusing all 4 series of real time data (2 primary PCRs with two differentV segments and two secondary PCR with different primers containing twodifferent tags). There was an improvement in efficiency between overhangsizes 10 and 14 bp. However there was little or no improvement inefficiency with an overhang over 14 bp. The efficiency remained high asthe overhang became as small as 14 bp because of the high concentrationof primers allowing the 14 bp to be sufficient priming template at atemperature much higher than their melting temperature. At the same timethe specificity was maintained because the template was not all the cDNAbut rather a low complexity PCR product where all the molecules had the14 bp overhang.

As illustrated in FIG. 2B, the primary PCR will use 34 different Vprimers that anneal to the other side of the template and contain acommon 14 bp overhang on the 5′ tail. The 14 bp is the partial sequenceof one of the Illumina sequencing primers (termed the Read 2 primer).The second amplification primer on the same side includes P7 sequence, atag, and Read 2 primer sequence (this primer is called Read2_tagX_P7).The P7 sequence is used for cluster formation. Read 2 and its complementare used for sequencing the V segment and the tag respectively. A set of96 of these primers with tags numbered 1 through 96 were created (seebelow). These primers can be HPLC purified in order to ensure that allthe amplified material has intact ends that can be efficiently utilizedin the cluster formation.

As mentioned above, the second stage primer, C-10-17-P5 (FIG. 2A) hasinterrupted homology to the template generated in the first stage PCR.The efficiency of amplification using this primer has been validated. Analternative primer to C-10-17-P5, termed CsegP5, has perfect homology tothe first stage C primer and a 5′ tail carrying P5. The efficiency ofusing C-10-17-P5 and CsegP5 in amplifying first stage PCR templates wascompared by performing real time PCR. In several replicates, it wasfound that PCR using the C-10-17-P5 primer had little or no differencein efficiency compared with PCR using the CsegP5 primer.

The molecule resulting from the 2-stage amplification illustrated inFIG. 2 will have the structure typically used with the Illuminasequencer as shown in FIG. 3. Two primers that anneal to the outmostpart of the molecule, Illumina primers P5 (AATGATACGGCGACCACCGAG) and P7(CAAGCAGAAGACGGCATACGAGAT) will be used for solid phase amplification ofthe molecule (cluster formation). Three sequence reads are done permolecule. The first read of 100 bp is done with the C′ primer, which hasa melting temperature that is appropriate for the Illumina sequencingprocess. The second read is 6 bp long only and will be solely for thepurpose of identifying the sample tag. It is generated using theIllumina Tag primer (AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC). The final readis the Read 2 primer, an Illumina primer with the sequenceGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT. Using this primer, a 100 bp read inthe V segment will be generated starting with the 1st PCR V primersequence.

A set of 6 bp sequence tags to distinguish different samples run in thesame sequencing lane was designed, where each tag is different from allthe other tags in the set by at least 2 differences. The 2 differencesprevent misassignment of a read to the wrong sample if there is asequencing error. The alignment done to compare the tags allowed gapsand hence one deletion or insertion error by sequencing will also notassign the read to the wrong sample. Additional features in selectingthe tags was to limit single base runs (4 A or T and 3 G or C) as wellas no similarity to the Illumina primers. In total 143 tags weregenerated with the premise that 96 of them will be used.

Real time PCR has been performed with all 34 different primers usingcDNA obtained from a blood sample. Different Ct values were obtained forthe different primers. Each of the PCR products was run by gelelectrophoresis and demonstrated a single band. In addition all 34primers were pooled and a PCR was performed and again a single PCR bandwas obtained.

Amplification Optimization

The multiplex amplification can use all the V segments. One issue inamplification of different sequences is the relative amplificationefficiency of the different sequences and the preservation of theinitial relative quantity of the different sequences in the finalamplified material. The relative amplification efficiency can besubdivided into different efficiencies of the distinct primer sequencesas well as the different efficiencies of amplification of differentsequences using the same primer. Efficiency difference can be due todistinct primer sequences. The reaction will be optimized to attempt toget amplification that maintains the relative abundance of the sequencesamplified by different V segment primers. Some of the primers arerelated, and hence many of the primers may “cross talk,” amplifyingtemplates that are not perfectly matched with it. The conditions can beoptimized so that each template can be amplified in a similar fashionirrespective of which primer amplified it. In other words if there aretwo templates, then after 1,000 fold amplification both templates can beamplified approximately 1,000 fold, and it does not matter that for oneof the templates half of the amplified products carried a differentprimer because of the cross talk. In subsequent analysis of thesequencing data the primer sequence will be eliminated from theanalysis, and hence it does not matter what primer is used in theamplification as long as the templates are amplified equally.

Since the amount of each template is not known in cDNA, set of standardshave been generated using the 34 singleplex PCR reaction from cDNA. Theproduct in each of these reactions comprised a plurality of sequenceswith one V primer. The different products were carefully quantitated tocreate a set of standards at the same concentration. A pool of all 34primers was used and 34 real time PCR were performed using the pool ofprimers and each of the standard sequences as a template. Ideally allthe 34 standards will show equal efficiency of amplification by realtime PCR. That suggests that each sequence is amplified equally eventhough the presence of cross talk makes it unclear what primers arecarrying out the amplification. This optimization is consistent with thegoal of having equal amplification irrespective of the actual primersthat is incorporated in the amplification product. Increasing the totalprimer pool concentration significantly reduced the dynamic range asexpected from increasing the efficiency of the amplification.Furthermore for templates that seemed to amplify more efficiently thanthe average, the concentration of their perfectly matched primer in thepool was decreased. Conversely for templates that were inefficientlyamplified the concentration of their perfectly matched primer wasincreased. This optimization demonstrated that all the templates areamplified within 2 fold of the average amplification.

Ideally the primary PCR will have a small number of cycles to minimizethe differential amplification by the different primers. The secondaryamplification is done with one pair of primers and hence the issue ofdifferential amplification is minimal. One percent of the primary PCR istaken directly to the secondary PCR. Thirty-five cycles (equivalent to˜28 cycles without the 100 fold dilution step) used between the twoamplifications were sufficient to show a robust amplificationirrespective of whether the breakdown of cycles were: one cycle primaryand 34 secondary or 25 primary and 10 secondary. Even though ideallydoing only 1 cycle in the primary PCR may decrease the amplificationbias, there are other considerations. One aspect of this isrepresentation. This plays a role when the starting input amount is notin excess to the number of reads ultimately obtained. For example, if1,000,000 reads are obtained and starting with 1,000,000 input moleculesthen taking only representation from 100,000 molecules to the secondaryamplification would degrade the precision of estimating the relativeabundance of the different species in the original sample. The 100 folddilution between the 2 steps means that the representation will bereduced unless the primary PCR amplification generated significantlymore than 100 molecules. This translate to a minimum 8 cycles (256fold), but more comfortably 10 cycle (˜1,000 fold). The alternative tothat is to take more than 1% of the primary PCR into the secondary butbecause of the high concentration of primer used in the primary PCR, abig dilution factor is can be used to ensure these primers do notinterfere in the amplification and worsen the amplification bias betweensequences. Another alternative is to add a purification or enzymaticstep to eliminate the primers from the primary PCR to allow a smallerdilution of it. In this example, the primary PCR was 10 cycles and thesecond 25 cycles.

Testing High Fidelity Enzymes

Enzymes with higher fidelity can be used to minimize errors. The assayhas been optimized using Taq polymerase. In order to validate the use ofAccuprime as well as Taq high fidelity a cDNA with the pool of primerswas amplified using Taq polymerase, Accuprime, or Taq high fidelity.Each of the amplified material was then used as a template for 34 realtime PCR for with each of the 34 V primers and the 1 C primer. Therelative amount of the templates was quantitated. A high correlation(r²>0.95) between the concentration of each template in the Accuprime,Taq high fidelity and Taq was found, validating the use of these highfidelity enzymes.

Validation of Amplification Conditions

The optimization was done using a pool of primers on the standardtemplate not with the cDNA background. The goal was to obtain validationfor these results in cDNA mixture. In order to show reproducibility,pool of oligos to amplify cDNA in duplicates were used. Each of the 34products were quantitated in each of the two amplifications. As shown inFIG. 5, the reproducibility was excellent.

For FIG. 5, two primary PCR reactions were performed using the pooledTCRβ primers and the C primer and one cDNA sample as a template. Therelative abundance in each of the amplified material of template that isamplifiable with each of the 34 V primers (and the one C primer) wasassessed using real time PCR. Using each of the two amplified productsas a template, thirty four different real time PCR reactions wereperformed using the C primer and one of the V primers in each reaction.The relative abundance determined by real time PCR was highlyreproducible using all the V primers between for the two samples,indicating that the multiplexed amplification is highly reproducible.The cycle number (Ct value) for each of the real time PCR amplificationsusing the one multiplexed amplification product as a template is shownon the X axis and using the second multiplexed amplification product asa template is depicted on the Y axis.

In order to assess the amplification bias a similar technique can beemployed. The pool of oligos can be used to amplify using cDNA as atemplate. Then the amount of template amplified by each of the 34different primers (along with C segment primer) a can be quantitatedusing real time PCR and that amount can be compared with the amountamplified using the same primer from the cDNA. However, since there iscross talk even if the relative abundance among the internal sequencesin the amplified product and the cDNA were the same, big differencesusing this readout may be detected. To alleviate this issue 12 oligoswere designed that can when used with C segment primer amplify sequencesinternal to the V segment primers. If optimization was doneappropriately, then the concentration of these internal sequences shouldchange little between the cDNA and the amplified products. This is shownis FIG. 6.

For FIG. 6, a cDNA sample was used as a template for a multiplexedamplification using the pooled TCRβ primers and the C primer. The Cprimer and primers were used that are downstream (to be named internalprimers) of the V primers used for the initial amplification and thematerial from the multiplex amplification material as a template toassess the relative abundance of the different sequences. Similarly realtime PCR was used to assess the relative abundance of these samesequences in the cDNA. If the multiplexed amplification had great bias,the relative abundance in the amplified material can be very differentfrom that in the cDNA. As can be seen in FIG. 6, high correlation wasseen demonstrating minimal amplification bias in the multiplexedamplification. The cycle number (Ct value) for each of the real time PCRamplification using internal primers, and cDNA and the multiplexedamplification product as template is shown on X and Y axis,respectively.

Sequencing TCRβ

Six multiplexed amplifications with the pooled oligos and one cDNAsample as a template were used. Three of each of the amplifications weredone with Accuprime and another 3 with high fidelity Tag. Twoamplifications with each enzyme used cDNA that correspond to 500 nginitial RNA, and one amplification with each enzyme used 10 times lesscDNA. For each of the six reactions a primary and secondary PCR wasperformed and the amplified material was sequenced using the Illuminaplatform and the scheme described above. 100 bp sequence from each sidewas obtained. The primary analysis of the data was done using the sameconcepts described below.

To assess reproducibility of the assay it was determined whetherclonotype levels are consistent in the duplicate experiments. As shownin FIGS. 8A-C, high correlation is obtained when the same enzyme andstarting input cDNA amount was used (each of the 2 comparisons hadr²=0.944). When different enzymes were used the correlation gets worse(median correlation for the 4 possible combinations r²=0.931), and it isonly modestly reduced (r²=0.924) when the 2 enzymes were used to amplifysmaller input cDNA (corresponding to only 50 ng RNA).

For FIG. 8, identical sequences in each sample were identified. Then todeal with sequencing errors some clonotypes were coalesced to formlarger clonotypes using the general approaches described in the sectionof primary analysis of sequence. The counts of clonotypes were thencomputed in each sample. A fraction of the clonotypes (not shown in thefigure) were present in one sample but not another, likely due to thealgorithm coalescing them with another clonotype in one sample but notthe other. The frequency of clonotypes in a sample is then computed asits number of counts divided by the total number of reads obtained forthat sample. For example if 1,000 counts are observed for a clonotype ina sample with 1,000,000 reads, its frequency is computed as 0.1%. FIG.8A shows the log₁₀ of the frequency of each clonotype in the twoduplicate samples using Accuprime and cDNA corresponding to 500 ng ofRNA as input template. The correlation (r²) between these duplicates is0.944. FIG. 8B depicts the log₁₀ of the frequency of each clonotypeusing cDNA corresponding to 500 ng of RNA as input template andAccuprime (X axis) or High fidelity Taq (Y axis). There are 4comparisons with this combination with a median correlation r²=0.931.The one shown in the figure has r²=0.929. FIG. 8C shows the log₁₀ of thefrequency of each clonotype using cDNA corresponding to 50 ng of RNA asinput template and Accuprime (X axis) or High fidelity Taq (Y axis). Theobserved correlation r²=0.924.

These results validate the reproducibility of the assay, and conform tothe expectation that reproducibility gets worse when different enzymesare compared. Further reduction is seen when lower amount of input cDNAis used reflecting that lower representation in the input material leadsto poorer precision in reflecting the relative abundance of thedifferent clonotypes in the blood. Additionally it is possible some ofthe reduction in the correlation is due to the additional amplification(10 fold) needed for the lower input, but this is likely to be the minoreffect given the evidence for the high reproducibility of theamplification.

Example 4 IgH Repertoire Analysis Amplification and Sequencing Strategy

One difference between amplification of CDR3 in TCRβ and IgH is thatmultiple primers for each V sequence will be used in investigation ofIgH due to the possibility of somatic mutations in IgH. Three differentprimers for each V segment will be used. The primers are in regionsavoiding the CDRs, which have the highest somatic mutations. Threedifferent amplification reactions will be performed. In each reaction,each of the V segments will be amplified by one of the three primers andall will use the same C segment primers. The primers in each reactionwill be approximately the same distance from the V-D joint. Assuming thelast position of the V segment as 0, then the first set of primers (A)have the 3′ end at approximately −255, the second (B) have the 3′ end atapproximately −160, and the third (C) have the 3′ end at approximately−30. Given the homology between several V segments, to amplify all the48V segments and the many known alleles (as defined by the internationalImMunoGeneTics information system http://imgt.cines.fr/) 23, 33, and 32primers in the A, B, and C frames respectively, will be needed. The listof primers are shown in Tables 2, 3, and 4.

TABLE 2  Frame A primers Sequence frame A IGHV1_1CCTCAGTGAAGGTCTCCTGCAAGG IGHV1_2 CCTCGGTGAAGGTCTCCTGCAAGG IGHV1_3CCTCAGTGAAGGTTTCCTGCAAGG IGHV1_4 GGGCTACAGTGAAAATCTCCTGCAAGG IGHV2_1AAACCCACACAGACCCTCACGCTGAC IGHV2_2 AAACCCACAGAGACCCTCACGCTGAC IGHV2_3AAACCCACACAGACCCTCACACTGAC IGHV3_1 CTGGGGGGTCCCTGAGACTCTCCTG IGHV3_2CTGGGGGGTCCCTTAGACTCTCCTG IGHV3_3 CAGGGCGGTCCCTGAGACTCTCCTG IGHV3_4CAGGGCCGTCCCTGAGACTCTCCTG IGHV3_5 CTGGGGGGTCCCTGAAACTCTCCTG IGHV3_6CTGGCAGGTCCCTGAGACTCTCCTG IGHV3_7 CTGGAGGGTCCCTGAGACTCTCCTG IGHV3_8CTGGGAGGTCCCTGAGACTCTCCTG IGHV3_9 TGGGGGGGCCCTGAGACTCTCCT IGHV4_1CTTCGGAGACCCTGTCCCTCACCTG IGHV4_2 CTTCGGACACCCTGTCCCTCACCTG IGHV4_3CTTCACAGACCCTGTCCCTCACCTG IGHV4_4 CTTCGGAGACCCCGTCCCTCACCTG IGHV4_5CGGGGACCCTGTCCCTCACCTG IGHV5_1 GATCTCCTGTAAGGGTTCTGGATACAGCT IGHV6TCGCAGACCCTCTCACTCACCTGTG

TABLE 3  Primers for frame B Sequence frame B IGHV6TGGATCAGGCAGTCCCCATCGAGAG IGHV5_1 GCTGGGTGCGCCAGATGCCC IGHV2_1TGGATCCGTCAGCCCCCAGG IGHV2_2 TGGATCCGTCAGCCCCCGGG IGHV1_1GTGCGACAGGCCCCTGGACAA IGHV1_2 GGGTGCGACAGGCCACTGGACAA IGHV1_3GTGCGCCAGGCCCCCGGACAA IGHV1_4 GGGTGCGACAGGCTCGTGGACAA IGHV1_5GGGTGCAACAGGCCCCTGGAAAA IGHV1_6 GGGTGCGACAGGCTCCTGGAAAA IGHV1_7GTGCGACAGGCCCCCGGACAA IGHV1_8 GTGCGACAGGCCCCCAGACAA IGHV4_1TCCGCCAGCCCCCAGGGAAGG IGHV4_2 TCCGGCAGCCCCCAGGGAAGG IGHV4_3TCCGGCAGCCACCAGGGAAGG IGHV4_4 TCCGCCAGCACCCAGGGAAGG IGHV4_5TCCGGCAGCCCGCCGGGAA IGHV4_6 TCCGGCAGCCGCCGGGGAA IGHV4_7TCCGGCAGCCCGCTGGGAAGG IGHV4_8 TCCGCCAGCCCCTAGGGAAGG IGHV3_1GGTCCGCCAGGCTCCAGGGAA IGHV3_2 GTTCCGCCAGGCTCCAGGGAA IGHV3_3GGTCCGCCAGGCTTCCGGGAA IGHV3_4 GGTCCGTCAAGCTCCGGGGAA IGHV3_5GATCCGCCAGGCTCCAGGGAA IGHV3_6 GGTCCGCCAAGCTCCAGGGAA IGHV3_7GGTCCGCCAGGCTCCAGGCAA IGHV3_8 GGTCCGCCAGGCCCCAGGCAA IGHV3_9GGTCCGCCAGGCTCCGGGCAA IGHV3_10 GGGTCCGTCAAGCTCCAGGGAAGG IGHV3_11CTGGGTCCGCCAAGCTACAGGAAA IGHV3_12 GGTCCGCCAGCCTCCAGGGAA IGHV3_13GGTCCGGCAAGCTCCAGGGAA

TABLE 4  Primes for frame C Sequence frame C IGHV7CTAAAGGCTGAGGACACTGCCGTGT IGHV6 CTCTGTGACTCCCGAGGACACGGCT IGHV5_1AGTGGAGCAGCCTGAAGGCCTC IGHV2_1 TGACCAACATGGACCCTGTGGACAC IGHV1_1ACATGGAGCTGAGCAGCCTGAGATC IGHV1_2 ACATGGAGCTGAGCAGGCTGAGATC IGHV1_3ACATGGAGCTGAGGAGCCTGAGATC IGHV1_4 ACATGGAGCTGAGGAGCCTAAGATCTGA IGHV4_1GAGCTCTGTGACCGCCGCGGAC IGHV4_2 GAGCTCTGTGACCGCCGTGGAcA IGHV4_3GAGCTCTGTGACCGCTGCAGACACG IGHV4_4 GAGCTCTGTGACCGCTGCGGAcA IGHV4_5GAGCTCTGTGACTGCCGCAGAcAcG IGHV4_6 GAGCTCTGTGACTGCAGCAGACACG IGHV4_7GAGCTCTGTGACTGCCGCGGAcA IGHV4_8 GAGCTCTGTGACCGCGGAcGcG IGHV4_9GGCTCTGTGACCGCCGCGGAC IGHV4_10 GAGCTCTGTGACCGCCGCAGAcA IGHV4_11GAGCTCTGTGACCGCTGACACGG IGHV3_1 CAAATGAACAGCCTGAGAGCCGAGGACA IGHV3_2CAAATGAACAGCCTGAAAACCGAGGACA IGHV3_3 CAAATGAACAGTCTGAAAACCGAGGACAIGHV3_4 CAAATGATCAGCCTGAAAACCGAGGACA IGHV3_5CAAATGAACAGTCTGAGAACTGAGGACACC IGHV3_6 CAAATGAACAGTCTGAGAGCCGAGGACAIGHV3_7 CAAATGAACAGCCTGAGAGCTGAGGACA IGHV3_8CAAATGAGCAGCCTGAGAGCTGAGGACA IGHV3_9 CAAATGAACAGCCTGAGAGACGAGGACAIGHV3_10 CAAATGGGCAGCCTGAGAGcTGAGGAcA IGHV3_11CAAATGAACAGCCTGAGAGCCGGGGA IGHV3_12 CAAATGAACAGTCTGAGAGCTGAGGACAIGHV3_13 CAAATGAGCAGTCTGAGAGCTGAGGACA

On the C segment side, two sequences with one base difference betweenthem (GCCAGGGGGAAGACCGATGG, and GCCAGGGGGAAGACGGATGG) cover the foursegments and the multiple known alleles of IgG. A scheme similar to thetwo stages of PCR for TCRβ genes will be used. On the V side, the same5′ 14 bp overhang on each of the V primers will be used. In thesecondary PCR, the same Read2-tagX-P7 primer on the V side is employed.On the C side a strategy similar to that used with TCRβ amplificationwill be used to avoid variants among the different IgG segments andtheir known alleles. The primer sequence (AATGATACGGCGACCACCGAGATCTGGGAAG ACGAT GG GCC CTT GGT GGA) comprises the sequence of the C segmentfrom positions 3-19 and 21-28 and it skips position 20 that has adifferent base in at least one of the different IgG alleles and thesequence for P5 that is can be used for formation of the clusters asshown in FIG. 4.

All the primers in the 3 frames were successful in amplifying a singleband from cDNA. Similarly the primary and secondary PCR strategy usingthe three pools of primers in the primary PCR showed a single band asshown in FIG. 7.

For FIG. 7, multiplexed PCR using 3 pools of primers corresponding tothe 3 frames was done using cDNA as a template. After the primary andsecondary PCR the products were run on an agarose gel. A denotes the PCRproduct from the pool of oligos of frame A. Similarly B and C denote theproducts of pools B and C. M is a marker lane. Single bands with theappropriate sizes were obtained using all 3 pools.

Ultimately, the 3 different reactions from a single sample will then bemixed at equimolar ratio and subjected to sequencing. Sequencing will bedone from both directions using the two Illumina primers. 100 bp will besequenced from each side. The maximal germline sequences encompassingthe D+J segments are ˜30 bp longer for BCR than TCR. Therefore if thenet result of nucleotide removal and addition at the joints (N and Pnucleotides) generate a similar distribution for IgH and TCRβ, it isexpected that on average 90 bp and maximally 120 bp of sequence afterthe C segment will be sufficient to reach the 3′ of the V segment.Therefore, in most cases, the sequence from the C primer will besufficient to reach the V segment. Sequencing from one of the Illuminaadapters should identify the V segment used as well as identify somatichypermutations in the V segments. Different pieces of the V segmentswill be sequenced depending on which of the three amplificationreactions the sequence originated from. The full sequence of the BCR canbe aligned from different reads that originated from differentamplification reactions. The sequencing reaction from the one endshowing the full CDR3 sequence will greatly facilitate the accuratealignment of different reads.

Example 5 Primary Analysis of Human Sequence Data

For each patient sample, approximately 1 million high quality reads of100 bp paired-end reads each will be obtained. It is assumed these 1million reads are independent, originating from 1 million or more RNAmolecules obtained from 1 million cells or more. Reads with low qualitywill be eliminated.

An error rate of ˜1% will be anticipated. Error can arise either fromthe reverse transcriptase, amplification during PCR or duringsequencing. Error proofing enzymes for the PCR steps will be used; hencesequencing errors (˜1%) will be the main source of error as the PCRerror rate is less than 0.1%. The relevance of PCR and reversetranscriptase error will be greatly magnified in situations where thereis a bottle neck. A bottleneck can occur, for example, if over 100,000RNA molecules is started with or one of the different molecularmanipulation steps is inefficient so as to make the effective populationof molecules 100,000. In these situations the same error that occurreddue to PCR or reverse transcriptase can appear in many clusters.

Data will be obtained for TCRβ and IgH. Given the somatic hypermutationin IgH and the difference in the amplification strategy, the primaryanalysis of TCRβ and IgH will be somewhat different.

TCRβ

On one end, the C′ segment primer for sequencing will be used. Thelength of the V segment that will be sequenced will depend on the lengthof N+P nucleotides added. Given the average number of added nucleotides,about 40 nucleotides of the V segment will be sequenced. On the otherend the Illumina sequencing primer P7 will be used. The sequence of 20bp of V primer sequence followed by 80 bp of V segment sequence will beobtained.

Reads will be aligned to the germline V segments (including thedifferent known alleles) to assign a V segment to each read. Reads thatdon't substantially match any V segments will be discarded from furtheranalysis reads. A substantial match will be defined as one where thereis no more than 5 errors. Given a random error rate of 1% it is expectedthat this scheme would discard <1% of reads due to error. The rest ofthe sequences will be assigned to the V segment that has the highestmatch. Primers amplifying V segments of the same family will often behighly related with one or a few bases difference between them. Theseprimers can “cross talk”, amplifying other family members. Therefore thebeginning of the sequencing read (the primer) can be for a V segmentthat is different from the V segment the rest of the read belongs to.Such cases will be allowed and do not count as errors.

The sequences from the end of the V segment to the beginning of the Csegment are evaluated. This sequence will be called the DJ region. Thissequence is on average 60 bases with a very small fraction as large as90 bases. It is likely to be able to assign the J segment in most casesas the end close to the C segment is likely to be preserved. On theother hand, the D segment may be difficult to assign given its smallsize (15-16 bp) and the trimming and addition that occurs on both of itsends. Of note is that when a J segment is assigned, it is possible topredict the D and C segment. After assigning the segment, the sequenceof the DJ region will be defined. Error complicates this analysis morethan it does for the C and V segment alignment for two reasons. Firsterror in the DJ alignment can occur in either of the two reads beingaligned effectively doubling the error rate of alignment of a singleread to the database sequence.

Additionally, one base difference in the C or V alignment can be readilyattributed to error (except for the rare case of a previouslyundescribed V germline allele), and sequences having one base differenceto a V segment can be assigned to have the sequence of the V segment. Onthe other hand it will not be clear whether a one base differencebetween two DJ region reads is due to an error or a genuine sequencedifference between two clonotypes. There are two possibilities: thereads belong to the same clonotype but have errors, or there are two ormore distinct clonotypes. Clonotypes will be designated as distinct whenthe chance of their emergence by sequencing error is low either becauseof their frequent observations or because they diverge in too manybases.

It is expected that PCR error will be concentrated in some bases thatwere mutated in the early cycles of PCR. Sequencing error is expected tobe distributed in many bases even though it will be totally random asthe error is likely to have some systematic biases. It will be assumedthat some bases will have sequencing error at a higher rate, say 5% (5fold the average). Given these assumptions, sequencing error becomes thedominant type of error. Distinguish PCR errors from the occurrence ofhighly related clonotypes will play a role in analysis. Given thebiological significance to determining that there are two or more highlyrelated clonotypes, a conservative approach to making such calls will betaken. The detection of enough of the minor clonotypes so as to be surewith high confidence (say 99.9%) that there are more than one clonotypewill be considered. For examples of clonotypes that are present at 100copies/1,000,000, the minor variant will be detected 14 or more timesfor it to be designated as an independent clonotype. Similarly, forclonotypes present at 1,000 copies/1,000,000 the minor variant can bedetected 74 or more times to be designated as an independent clonotype.This algorithm can be enhanced by using the base quality score that willbe obtained with each sequenced base. If the relationship betweenquality score and error rate is validated above, then instead ofemploying the conservative 5% error rate for all bases, the qualityscore can be used to decide the number of reads that need to be presentto call an independent clonotype. The median quality score of thespecific base in all the reads can be used, or more rigorously, thelikelihood of being an error can be computed given the quality score ofthe specific base in each read, and then the probabilities can becombined (assuming independence) to estimate the likely number ofsequencing error for that base. As a result, there will be differentthresholds of rejecting the sequencing error hypothesis for differentbases with different quality scores. For example for a clonotype presentat 1,000 copies/1,000,000 the minor variant is designated independentwhen it is detected 22 and 74 times if the probability of error were0.01 and 0.05, respectively.

After designating clonotypes that occur too frequently to be due toerror as distinct or independent, criteria will be considered that allowdesignation of clonotypes as independent due to their differences in toomany bases. It is expected that less than 0.1% of time, two reads willhave more than 4 errors in 60 bp between them. Therefore this will beused as a cut off to consider two clones as independent or distinct. Thealgorithm that will be employed will be as follows. The clonotype withthe largest number of counts (clonotype 1) will be noted, and it will bedetermined whether there are any other clonotypes that have the same Vsegment and have 4 or less base differences from it in the DJ region. Ifmore than one such clonotype is identified, the largest of theseclonotypes will be assessed first. The rule described above will beapplied to decide whether to designate the clonotype as an independentclonotype or the same as the major clonotype. If it is not designated asan independent clonotype it will then be counted as if it has thesequence of clonotype 1. At the end of this exercise the sequence andcounts for all clonotypes will be obtained. This approach ensures thatclonotypes will not be designated as independent when they are not.However, some truly independent clonotypes may be misclassified (notfrequent with low number of differences from the major clonotype) asbeing the same. This type of error will be much less damaging thanconsidering two clonotypes as independent when they are not.

IgH

For sequencing of the C segment end, the Illumina primer will be used.The first bases to be sequenced will be the C segment primer followed by0-2 bases of C segment and then the DJ region. The primer sequence willidentify which isotype the specific read belongs to. All or most of theDJ sequence can be obtained through the read from the C segment side. Itis expected that the DJ region will be on average 80 bp. Therefore, 100bp read will encompass the C segment primer and the average DJ region.Some DJ regions may be as large as 120 bp, and their full reading caninclude sequencing data from the V region (not counting those caseswhere 2 D segments are found in the same IgH).

The sequence of the DJ region obtained from the C segment will initiallybe considered. The number of each unique sequence will be counted. Asdiscussed for TCRβ, some of the related sequences originate probablyfrom the same clonotype but have some sequencing and PCR error. Todesignate two clonotypes as distinct, it will be determined whether thedifference is very unlikely to have arisen through PCR error. The samescheme as above of demanding a minimum number of independentobservations of the minor clonotypes or a minimum number of differencesbetween the two clonotypes will be used. The same rules described forTCRβ to ensure that <0.1% of clonotypes are misclassified as distinctwill be employed.

Sequencing from the V region will be done using an Illumina primer.There will be 3 different primers for each V segment. The primers willbe placed at approximately −200, approximately −100, and approximately−30. The first sequenced bases will be the V primer sequence followed bymore of V segment bases. The specific read to one of the three primingframes of a specific V segment will be assigned. The assignment will befirst done through investigating the primer sequence since the primershave a known sequence that can't change. Primers of the same family cansometimes have some “cross talk” amplifying highly related sequence inthe same family. The primers will be used to assign the family. Thespecific V segment among the segments in the family will be determinedby identifying the V sequence of the family that is most similar to thatof the sequencing read. V segments in IgH can have somatic mutations inthe course of the antibody affinity maturation process and therefore ahigher proportion of differences from the germline sequences will beallowed than for TCRβ. Antibodies with more than 25 mutations (˜10%) inthe VDJ region have been observed. A read will be assigned to theframework of the V segment with the closest sequence as long as ithas >85% homology to it. Related clonotypes will be assessed for beingindependent using the same scheme as described above. Specifically forclonotypes to be considered distinct and due to somatic mutations noterror, they need to be either sufficiently frequent or have amplevariation between them to ensure that less than 0.1% of clonotypesmisclassified as distinct when they are not.

For reads that are determined to have the third framework (closest tothe VD junction), the overlap sequence between the paired end reads willbe determined. Bases not aligned with the V segment will be aligned tothe complement of the first read to determine the overlap. The primersfor the third framework will be ˜30 bp away from the junction.Therefore, if the V is intact approximately 50 bp of sequence can beused to reach the VD junction (20 bp primer+30 bp), and 100 bp readswill allow 50 bp to be read after the junction. This is the minimumexpected number of bases that would be read after the VD junction as thedeletion of some bases in V will allow for a longer read after thejunction. Even for the longest DJ region, it is expected that there willbe 10 bp overlap between the sequences from the paired reads. Thelongest DJ region is expected to be 120 bp, and 80 bp of it are expectedto be read from the C segment and 50 bp read from the other direction ofthe V region leading 10 bp overlap between the paired reads.

Clonotypes that read the same sequence from the C primer but hasdifferent frameworks of the same V segment become candidates forconsolidation in the same clonotypes. If there is overlap between thesequences obtained from the different frameworks of the V segment thenthe determination of whether the clonotypes are independent or not isdone with the same rules as described above.

As a result of the above analysis, the number of reads for eachclonotype can be counted. Clonotypes that are from the same familydiffering from each other only by somatic mutations are identified.These somatic mutations can be restricted to sequences in the V segmentread from only 1 framework, more than 1 framework, or in the DJ region.

Example 6 TCR and IgH Repertoire Analysis in SLE Patient Samples

It will first be tested whether there are clonotypes that correlate withdisease activity in patients. Second, a set of sequence characteristicsand/or cell surface markers that distinguish clonotypes that correlatewith disease from those that do not will be defined. Third, the degreeto which clonotype analysis provides clinically useful information willbe measured, such as the correlation with short term (e.g., 3 month)outcome.

1. Presence of Clonotypes Correlating with Disease

There will be two main tasks: identifying correlating clonotypes andmeasuring disease activity from their level. These tasks can be done ina clinical setting in two steps for each patient:

1) A Calibration test can be done to determine the identity of thecorrelating clonotypes for the specific patient. This can be done bysequencing IgH and TCRβ RNA (or linked TCRα-TCRβ sequence from a singlecell) for each patient at a time of a peak of an episode, at which timethe correlating clonotype level can reach their highest levels.

2) A Monitoring test can be done to determine the level of thecorrelating clonotypes at a time point subsequent to the calibrationtest. This can be done by sequencing IgH and TCRβ RNA and determiningthe level of the specific correlating clonotypes that had beenidentified in the calibration sample of the same patient. The level ofthe correlating clonotypes is used to compute the disease activity atthese points.

Amplification, sequencing, and primary analysis development as describedabove will be used to assess patient samples. Specifically, a set ofsystemic lupus erythematosus (SLE) patients will be assessed that have aone year follow up period and serial blood samples during this period.These patients were seen By Dr. Michele Petri at Johns Hopkins MedicalSchool every three months for one year, and clinical measures of diseaseactivity including Systemic Lupus Erythematosus Disease Activity Index(SLEDAI), Physician Global Assessment (PGA), as well as multiple labtests including C3 (Complement 3) and anti-ds DNA levels are availablefor all visits of all patients. Drugs being administered to thepatients, include prednisone, plaquenil, NSAID, NSAIDType,acetylsalicylic acid (ASA) dose, plavix, diuretic, ACE-Inhibitors orangiotensin receptor blockers (ARBs), Ca channel blocker, Triam and,solumedrol, Patients who had at least at one time during the follow up asignificant change in disease activity as defined by a 3 points changeon the SLEDAI or a 1 point change in PGA will be studied. Overall thereare 181 patients (with a total of 815 blood samples) who fit thesecriteria. RNA from all these blood samples will be subjected tomultiplex PCR using primers described above to amplify the sequencesthat encompass CDR3 in IgH and TCRβ. All the amplified materials will besequenced (to a million reads) and the abundance of different clonotypeswill be determined.

Using the clinical data, sequencing, characteristics that distinguishclonotypes whose level correlate with disease activity from those thatdo not will be identified. Second, an algorithm to determine diseaseactivity using the blood IgH and TCRβ profile will be developed.

2. Identification of Characteristics of Correlating Clonotypes

It is anticipated that clonotypes that are relevant to the disease willbe increased at the time of high disease activity. However, not allenriched clonotypes at a point of high disease activity necessarilycorrelate with disease. For example, in a particular patient there mightbe 10 enriched clonotypes at the point of high disease activity, butonly 5 correlate with the disease. In order to identify these relevantclonotypes, a subset of clonotypes that are clearly correlating withdisease and another set that clearly do not correlate with disease willbe studied. Characteristics that distinguish those two classes ofclonotypes will be investigated.

All patients will have at least one significant change in diseaseactivity during the one year follow up in this experimental design. TheIgH and TCR clonotypes obtained at the peak of disease activity in eachpatient will be analyzed. Sets of correlating and not correlatingclonotypes among those with the highest level clonotypes will beselected. Hence the first step is to define clonotypes that are at ahigh level. The specific criteria to choose the clonotypes that willenter the analysis will include a combination of frequency rank of theclonotype and the level of clonotype (number of clonotype reads permillion), as well as evidence the clonotype does not belong to thedistribution of low frequency clonotypes.

This set of clonotypes from each patient sample, termed High PrevalentClonotypes (HPC) will be further analyzed. The correlation of the levelof each of these clonotypes with clinical measures will be evaluated.The correlation of SLEDAI score with the clonotype level will becomputed. For each patient there will be 4-5 study points that can beused to assess the correlation of SLEDAI with the level of each HPC. Thedistribution of these obtained correlations will be investigated. It isanticipated that most of the HPCs will have low correlation with SLEDAI.It will be investigated whether at the high correlation end there is anexcess to what is expected to be generated randomly. For example with 4and 5 data points it is expected that ˜2.5% and ˜0.6% of the correlationlevels (r²) will be >0.9 by chance. A higher proportions of HPCs withr²>0.9 indicates the presence of a clonotypes that correlate withdisease. In addition to comparing the number of correlating clonotypeswith random expectation, a permutation analysis will be performed wherethe correlation of SLEDAI scores from one patient and the level ofindividual HPCs from another will be calculated. The distribution ofcorrelations generated from this permutation can be used as the“background” correlation. (To ensure its validity, it will be confirmedthat there is little correlation between SLEDAI between differentpatients). Excess correlation at the high correlation end, e.g., r²>0.9will indicate the presence of clonotypes that correlate with disease.The highest correlating clonotypes as the set of correlating clonotypeswill be picked. Because the number of HPCs that has a by chancecorrelation higher than a set threshold is known (from calculation usingrandom assumption or through the permutation analysis described above),the threshold to define the correlating clonotype can be set in such away as to have 10% false discovery rate, i.e. 10% of the correlatingclonotypes set will be correlating by chance. A set of HPCs that havevery little correlation with SLEDAI score will be picked. Those willserve as the set of non-correlating clonotypes. These 2 sets ofclonotypes can be further analyzed to identify characteristics that maydistinguish them. These characteristics can then be looked for in newsamples to identify the clonotypes likely to be correlating with diseaseactivity in these samples. The blood levels of these clonotypes can thenbe followed to determine disease activity.

One complication arises from the premise that clonotype level may changebefore disease activity does. Hence it is possible that by attempting tostudy only HPCs that highly correlate with SLEDAI, clinically usefulclonotypes that change earlier than SLEDAI may be eliminated. Anotherset of clonotypes will be picked that correlate with a Modified SLEDAI(MSLEDAI) score. MSLEDAI is the same as SLEDAI in all the study pointsexcept those just before a significant change. For those data points theMSLEDAI score will be the average between the SLEDAI score at that pointand the next study point. Clonotypes that change before SLEDAI arelikely to show better correlation to MSLEDAI than SLEDAI. It will beinformative to compute the excess number of HPCs that have highcorrelation with MSLEDAI than expected by random or permutationgenerated expectations.

Characteristics that distinguish correlating clonotypes from those thatdo not correlate will then be identified. The analysis will be done inthe exact manner for those clonotypes that correlate with SLEDAI orMSLEDAI. In either case the goal would be for these set ofcharacteristics to correctly recapitulate this classification enablingthe identification of correlating clonotypes in the next set of samples.It is expected that each patient will have a unique set of correlatingclonotypes, but the training study will be designed to generate therules that predict the correlating clonotypes from a calibration sample(at high disease activity). Two general types of parameters can betested: those that are obtained from the sequencing data itself, andthose that can use extra experimentation. Extra experimentation caninclude the assessment of different cells with different cell surface orother markers. Here are a few types of parameters that will beinvestigated:

1) Sequence motif: The motif can be a specific V or J region, acombination VJ, or short sequences in DJ region that is associated witha clonotype being correlating.

2) Size of the clonotype.

3) Level: Absolute level (number of reads per million) or rank level.

4) Similarity to other clonotypes: The presence of other highly relatedclonotypes, like those with silent changes (nucleotide differences thatcode for same amino acids) or those with conservative amino acidchanges.

5) For the BCRs the level of somatic mutations in the clonotype and/orthe number of distinct clonotypes that differ by somatic mutations fromsome germline clonotypes.

Each of these parameters will be individually studied for associationwith correlating clonotypes. A threshold of 0.05 (uncorrected formultiple testing) will be set to eliminate factors that are not likelyto contribute to prediction of correlating clonotypes. Given themultiple parameters, many tests will be performed to generate multiplepositive results by chance. However the main goal of this step is tofilter the parameters to a smaller set. The set of positive parameterswill then be used to create an algorithm to classify the two sets ofclonotypes. A machine learning algorithm will be employed that uses thedifferent parameters to classify the two sets of clonotypes. In order tominimize the risk of overfitting, the cross validation technique will beused. Using this algorithm each clonotype will get a score thatcorresponds to the likelihood it is a correlating clonotype. A thresholdwill then be placed to classify clonotypes above it as correlating andthose below it as non-correlating. The accuracy of the classificationcan be estimated by the cross validation technique; for example, theclonotypes are put in equal groups and the algorithm using allclonotypes except one group. Clonotypes in the last group (test group)are then classified using the algorithm that was obtained using the restof the clonotypes. This is iterated as many times as the number ofgroups, and in each iteration all the groups except one are used fortraining and one group is classified. The accuracy of the algorithm canbe estimated from the average accuracy of the different classificationsin the different iterations. It is of note that in all these iterationsthe exact algorithm would be slightly different. The accuracy ofclassification is then an estimate as it is not on the final algorithmbut rather on a set of related algorithms generated with training datafrom all clonotypes except one.

Ultimately, two algorithms will be generated trained on two differentcorrelating clonotypes sets: one correlating with SLEDAI and the othercorrelating with MSLEDAI. Even if the clonotypes in the training set aredifferent the resulting algorithm may or may not be very different,depending on whether these clonotypes indeed come from two distinctpopulations. The algorithms will be compared. Additionally thesealgorithms will be used to identify correlating clonotypes that were notinitially in the training set. The clonotypes identified in the twoalgorithms will be compared, and if the initial clonotypes in the twotraining sets were from the same population, the identified clonotypesare likely to be very similar. Unless the results of the algorithm werequite similar, both algorithms will be carried to identify correlatingclonotypes in order to measure lupus disease activity.

Other experimental approaches can add to the power of sequencing inidentifying clonotypes that correlate with diseases. Correlatingclonotypes may be enriched in cells with some surface or other markers.For example B cells with high levels of CD27 are known in active lupuspatients, and hence it might be that correlating clonotypes might beenriched in the CD27 population of cells. If that is borne out to betrue, prediction of correlating clonotypes can be improved by doing anenrichment for cells with high levels of CD27. Specifically, asequencing reaction can be performed on the IgH sequences from all Bcells in the blood sample as well as from those B cells with high CD27.Correlating clonotypes are expected to be present at higher frequency inthe high CD27 population than in the all blood sample.

3. Using IgH and TCRβ Profiles to Determine Lupus Disease Activity

The section above described clonotype-based analysis to identifyfeatures of correlating clonotypes. In addition, for that analysis onlya fraction of all the HPCs were used to clearly designate clonotypes ascorrelating or non-correlating. This section describes analysis that isat the patient level aiming to compute a measure of disease activity, tobe called AutoImm (AI) score. The algorithm developed per the abovesection will be applied to identify correlating clonotypes among all theHPCs. The level of these correlating HPCs will be determined. The levelof the correlating clonotypes can be normalized to the total number ofTCR clonotypes as well as to HPCs predicted not to correlate withdisease. The level of these correlating clonotypes at different timepoints will be used to compute AI score at these different points.

In patients with more than one correlating clonotypes, the informationregarding the level of these different clonotypes will be combined. Inaddition data from IgH and TCRβ clonotypes will be integrated. Differentalgorithms for making the combination will be attempted. For example,the average, median, sum, and highest correlating clonotype level willbe studied. The clonotype level can be its simple linear read counts,the logarithm of that or some other conversion. It can potentially bethe difference between correlating and non-correlating clonotypes.Furthermore methods for weighted average can be utilized. The weightingcan be based on the likelihood of a clonotype to be correlating.

In order to evaluate which of the models is optimal, all the models willbe assessed to identify the one that generates the highest correlationbetween the AI score and the SLEDAI score. For this analysis thecorrelation of SLEDAI and AI scores is done across all the data obtainedfrom all the study points from all patients. In order to estimate andameliorate the degree of overfitting, the cross validation techniquewill be used. The level of correlation measured reflects the “crosssectional” relationship between the AI and SLEDAI scores. In addition toSLEDAI, the correlation with other clinical measures like C3 and anti-dsDNA antibody levels as well as urine protein/serum creatinine forpatients with kidney manifestation and blood counts for patients withhematological involvement will be studied. The correlation may be due tothe classification of patients into high and low disease activity, andis not necessarily a reflection of AI correlating with SLEDAI scorewithin a patient. To demonstrate that, “longitudinal” assessment will bedone.

4. Longitudinal Analysis

In the longitudinal analysis, two general questions will be assessed:does AI score at one study point predict disease activity at the samepoint, and does AI score at one study point predict disease activity ata later point, e.g., the next study point 3 months later.

The relationship between AI and SLEDAI scores at the same study pointwill be assessed in two ways. First the correlation of the AI and SLEDAIin each patient will be calculated, and then the average and medianpatient correlation level will be computed. If the correlation seen incross sectional analysis above is due to classification of high and lowdisease activity patients and not changing disease activity withinindividual patients, then the longitudinal correlation in individualpatients is likely to be low. A high median patient correlation levelsuggests that AI does reflect the SLEDAI score at an individual patientlevel. In addition to the correlation of AI and SLEDAI scores, thecorrelation of AI with other relevant measures like C3 and anti-ds DNAantibody will be assessed as well as urine protein/serum creatinine forpatients with kidney manifestation and blood counts for patients withhematological involvement.

Another way to demonstrate the ability of AI score to measure diseaseactivity changes in individual patients is by determining its accuracyin distinguishing states of high from low disease activity in the samepatients. For each of the 181 patients, the two study points when theSLEDAI where at the highest (to be called HDAP for high disease activitypoint) and lowest levels (to be called LDAP for low disease activitypoint) will be selected. The distribution of the AI of all the HDAPswith that of the AI of all the LDAPs will be compared, and the p-valuethat they are different will be computed. In addition, the frequencythat the AI at HDAP is higher than LDAP in each patient will beassessed. If AI does not change with disease activity in an individualpatient then it is expected that AI at HDAP is higher than that at LDAPonly 50% of times. Another analysis will be done where the fraction oftimes that AI at HDAP is higher than that at LDAP by a meaningfuldifference (i.e., above the likely AI variation) is determined. Tomeasure the fluctuation of AI, all the study points from all thepatients will be used, and the standard deviation (and relative standarddeviation) of AI in the different bins of SLEDAI values can be computed.This will generate relative standard deviation across all patients(AI-RSDall) and this value may or may not be dependent on SLEDAI (i.e.the AI-RSDall may be different at different SLEDAI values). Theproportion of patients where AI at HDAP is higher than AI at LDAP by aspecific number (e.g., 2) of AI-RSDall can be computed. There can besome systematic bias where the computed AI in some patients isconsistently higher (or lower) than what is expected from the SLEDAscore. Therefore AI-RSDall is a combination of the intrinsic fluctuationof AI within a patient as well as the systematic difference of AI forpatients with similar SLEDAI. The intrinsic fluctuation of AI can becomputed within a patient by calculating the standard deviation (andrelative standard deviation) of AI scores among study points withsimilar SLEDAI values (<2 points difference) within a patient. Themedian among all the patients of the relative standard deviation can becomputed (AI-RSDpt-med). The proportion of patients where AI at HDAP ishigher than AI at LDAP by a specific number (e.g., 2) of AI-RSDpt-medcan then be evaluated.

After demonstration that AI does indeed fluctuate with SLEDAI withinindividual patients it will be evaluated whether AI can predict SLEDAIat the next study point, 3 months later. To assess that correlationlevel between the AI score at time 0 and the SLEDAI score at time+3months can be quantitated. The correlation can be computed on a patientlevel and then the median patient correlation can be obtained. Anotherway to demonstrate the ability of AI to predict near future diseaseactivity is to evaluate the sensitivity and specificity of AI inpredicting disease activity 3 months in the future. Clinically, thosepatients who are doing well on their current management can bedistinguished from those that do not. A patient state at a particulartime will be classified into one of two classes: Poor Control (PC) andinclude patients who in 3 months will have high disease activity(SLEDAI >6 points) and/or a flare (SLEDAI increase by 3 points), andGood Control (GC) and include patients who in 3 months will have low ormoderate with disease activity (SLEDAI <6) and/or a significantreduction in disease activity (SLEDAI decrease by 3 points). Theclassification sensitivity can then be evaluated and specificityobtained using different thresholds of AI. A ROC curve that describesthe performance of AI in predicting the state of the patient (PC or GC)can be generated 3 months ahead of time. The performance obtained bythis test will be compared with that of standard clinical measuresincluding SLEDAI, anti-ds DNA and C3 levels.

An analysis to evaluate the ability of AI to predict changes in SLEDAIscores 3 months later will also conducted. Using data from all studypoints of all patients, the relationship between AI and SLEDAI scorescan be plotted to identify the “cross sectional” correlation level asdiscussed above. This determines the relationship between SLEDAI and AIat the same study point. This relationship will be fit with an equationallowing the prediction of the SLEDAI score given an AI score (or viceversa). If AI predicts flares then changes in SLEDAI at some study point1 will be preceded by changes in AI at point 0. Therefore, if a flareoccurs between point 0 and 1, the AI score at point 0 (to be calledAImeas) will be higher than what is expected (to be called AIexp) giventhe SLEDAI at study point 0. On the other hand with no change in diseaseactivity between the study point 0 and study point 1, the AI score atpoint 0 will be very similar to what is expected given the SLEDAI atstudy point 0. The relative AI change (Rel-AI-diff) can be computed bydividing the difference of AImeas and AIexp by AImeas. The sensitivityand specificity of AI in predicting a significant change in SLEDAI 3months later can be evaluated by using different thresholds ofRel-AI-diff. The thresholds can be bidirectional so if the Rel-AI-diffat a specific study point is higher than a specific threshold a flare ispredicted, and similarly if it is lower than the negative of thespecific threshold a significant reduction in SLEDAI is expected. On theother hand when the Rel-AI-diff at a study point is between thethreshold and its negative, no significant changes in disease activityis expected. A ROC curve showing the trade of sensitivity and falsepositives can be generated using many different thresholds ofRel-AI-diff. Similar ROC curves can be generated using standard clinicalmeasures including SLEDAI, anti-ds DNA and C3 levels.

If the fluctuation of AI varies at different SLEDAI values, the aboveanalysis will be refined. A section above described the computation ofAI-RSDall and AI-RSDpt-med and mentioned evaluating whether they changeat different SLEDAI values. If they do then the ROC analysis can be doneas described above but instead of using different thresholds ofRel-AI-diff, different thresholds of AI-RSDall and AI-RSDpt-med will beused. The performance obtained by the test with that of standardclinical measures including SLEDAI, anti-ds DNA and C3 levels will becompared.

In the above analysis, attempts are made to predict the SLEDAI at point1 from the AI score at point 0. It is likely that in addition to theabsolute level at point 0, the change of AI from point −1 to 0 will beinformative in predicting SLEDAI at point 1. For example consider apatient who has at study point −1 an AI score of X−1, and at point 0 theAI score is increased to a new value X0 that is appreciably higher thanX−1. This patient may have higher likelihood of a flare at point 1 thana patient whose AI has been stable at X0 at study points −1 and 0. Thisconcept of AI change or velocity will be incorporated to generate aModified AI (MAI) score. To generate a MAI at point 0 the AI score atpoint −1 and at point 0 will be needed, and hence one data point perpatient will not have an MAI associated with it. The specific formula toincorporate the velocity into AI calculation to obtain MAI will beoptimized. This optimization may be done through maximization of thecorrelation of MAI and SLEDAI three months later. The cross validationdesign will be used to evaluate and control the degree of overfitting.Correlation can be done for data points of all samples, but also can bedone at a patient level and the median correlation among all patientscan be assessed. The latter approach ameliorates the issue of somepatients having a systematic bias of too low or too high AI score. UsingMAI, the same type of ROC analysis that was mentioned for AI can beperformed to assess its ability to predict SLEDAI 3 months later. First,analogously to what is described for AI, an analysis can be done to showthe ability of MAI at point 0 to distinguish PC and GC states atpoint 1. Additionally, an analysis similar to what was described for AIto assess the ability of MAI at point 0 can be performed to predictsignificant disease activity change (3 points change on SLEDAI) betweenpoints 0 and 1. For this latter analysis different thresholds ofRel-AI-diff, AI-RSDall or AI-RSDpt-med can be used. The performance ofMAI will be compared with that of AI to determine whether the additionof the velocity factor is useful.

One complication of the described study is that treatment changes aredone for different patient during the follow up period of the study.This is likely to complicate the prediction of disease activity. Forexample, consider two patients with the same AI score at point 0 and oneof those patients had a reduction in medication at the same time. Thelikelihood of this patient to have a rise in disease activity at point 1is then likely to be higher than for the patient who did not changemedications at point 0. This is likely to lead to underestimation of theperformance of AI. One way to alleviate that is to eliminate all thepoints with significant medication changes from the study. Another is tomodify the AI score to include whether a patient has a medication changeand create a medication-modified AI. So in the example above with thetwo patients, the one with the medication change will have a highermedication-modified AI.

5. Integration with Other Predictive Markers

The predictive ability of the disease activity marker can be maximized.Therefore the predictive ability of the TCR/BCR repertoire informationintegrated with other markers will be tested. These markers includestandard markers used in the clinic like anti-ds DNA and C3 levels. Itwill also include other markers that are published. For example a panelof chemokines has already been shown to have some predictive abilityusing the same set of patients as will be used. Whether this panel willincrease the predictive ability of the TCR and BCR repertoire will beevaluated. The first step is to integrate the AI score with theadditional measure to generate an Expanded AI (EAI) score. Differentways to do the integration can be assessed, and this can be optimizedthrough maximization of the correlation of EAI and SLEDAI three monthslater. The cross validation design will be used to evaluate and controlthe degree of overfitting. Using EAI the ability to predict diseaseactivity 3 months later will be assessed by its ability to distinguishGC from PC and to predict changes in disease activity. The performancein measuring disease activity and change in disease activity can bedescribed through ROC analysis as described above.

6. Validation

The number of variables being tested is high compared with the number ofsamples. This can lend itself to overfitting, with initially promisingresults not being able to be validated in later studies. A crossvalidation approach will be used in the training to get a measure of theextent of overfitting. However, a validation on an independent set ofsamples will be involved in later work. This is not part of thisproposal, but this marker can be clinically applicable. Using the dataobtained above, it can be determined whether AI, MAI, or EAI, should bevalidated and the specific way to compute the measure of interest. Onespecific algorithm will be taken for validation. In addition one or morespecific endpoints will be specified. The sensitivity and specificity ofAI can be assessed in the ability to distinguish GC from PC 3 monthslater to evaluate the ability of AI to predict disease activity. Inanother example the sensitivity and specificity of AI to predictsignificant disease activity change in 3 month using a specificRel-AI-diff threshold can be assessed.

Example 7 Measuring Response of an SLE Patient to Drug Therapy

The methods of the provided invention will be used to measure theresponse of an SLE patient to drug therapy. Determination of whether anSLE patient being given an expensive drug with serious side effects isresponding to the drug plays a role in both patient care and also formaking the administration of such care cost effective. Many clinicalindicators of disease activity respond to treatment imprecisely andafter a time lag of up to several months. During this time, disease mayprogress and side effects may add complications to therapy. A promptunderstanding of the drug response would allow patients to be switchedto more effective therapies more rapidly.

In this Example, a 35 year old African American female with a priordiagnosis of lupus presents to her regular rheumatologist. The patient'sdisease status is assessed on a quarterly basis through a comprehensiveclinical assessment in addition to laboratory testing includingmeasurement of C3, anti-ds DNA antibody levels, blood counts, andurinalysis. During one visit the patient complains of skin lesions andfatigue, and urinalysis shows evidence of proteinuria and/or cell casts.The rheumatologist refers the patient to a nephrologist for a kidneybiopsy to assess inflammatory status of the kidney and orders serumcreatinine and 24 hour urine protein to creatinine ratio to assess thedegree of the impairment of the kidney function. A kidney biopsy showsevidence of diffuse lupus nephritis, while the urine protein tocreatinine test reveals evidence of nephrotic syndrome (urine protein tocreatinine ratio of 3.6). Based on this information a diagnosis of acutelupus nephritis is given and the patient is begun on a course of drugtherapy. There are several possible drugs that can be chosen at thispoint. Immunomodulators such as mycophenolate mofetil (Cellcept) areoften used although sometimes in severe cases drugs such asMethotrexate, Azathiopurine (Imuran) Cyclophosphamide (cytoxan), areprescribed. Rituximab (Rituxan) is also sometime used as a second orthird choice. One of these drugs is often used in combination with asystemic steroid such as Prednisone or methylprednisolone in order tosuppress the acute symptoms. Here, mycophenolate mofetil is prescribedat 150 mg per day alongside 60 mg of prednisone. Given the many sideeffects of steroids, including the risk of osteoporosis, hyperglycemia,weight gain, and other Cushingoid symptoms in the long term, thepatient's prednisone dose is tapered over ˜6 weeks if the clinicalpicture allows that.

The first question that is determined is whether the patient isresponding to therapy, and as a result, can the dose of steroid can beappropriately decreased. Therefore, during this period the patient'sserum creatinine as well as urine protein and creatinine are followed toensure the patient is responding to the medications. Frequent kidneybiopsy can be done to detect whether the inflammatory damage is beingreversed; however, routine use of kidney biopsy carries too great a riskand is too invasive to be practical. Current blood based markers thatare being used to assess inflammatory status are of limited use inmaking this decision in that they are not sufficiently well correlatedwith underlying disease to be relied upon to risk the increased sideeffects that accompany high doses of steroids. Serum and urine functionmarkers may have some delay in detecting improvement in inflammatorystatus and hence steroids may be tapered before these markers show adefinitive change and hence extending the period of the renal flare. Aslower taper, informed by more sensitive markers, in these cases couldhave shortened the flare period preventing further damage to kidneytissue. After the reduction of steroid to a maintenance dose ofapproximately 10 mg the patient may show persistently elevated levels ofprotein in the urine and the high urine protein to creatinine ratio of2, and the physician must now decide whether to switch from Cellcept toanother medication. Arguing in favor of this is the continued evidenceof loss of kidney function but without an accurate measure ofinflammatory kidney status, it can be difficult to know whether thedisease itself is in remission having nevertheless done some level ofirreversible kidney damage that is resulting in these persistent levelsof proteinuria. Here again the existing blood based markers areimperfectly informative and a further kidney biopsies are not practical.This decision would be greatly aided by an accurate blood based measureof disease status.

AutoImm Load would be very helpful in this situation to assess theresponse to therapy by measuring disease activity either alone or incombination with other markers of disease activity. An algorithm forAutoImm Load will be developed using the study described above. Thecorrelating clonotypes that will be used to calculate AutoImm Load willbe measured using a calibration test. This calibration test will be doneusing blood from a patient at a time of peak disease activity, forexample at the start of therapy. The calibration test will be performedusing blood or alternatively using the tissue that is affected (e.g.kidney biopsy or skin biopsy). At a later time at which the response totherapy is to be assessed, a blood sample will be taken and used alongwith the calibration test to measure AutoImm Load. This will be used tomake a treatment decision. If the correlating clonotypes are derivedfrom a populations study, there is no need for the calibration test anda blood test at the time at which the response to therapy is to beassessed is sufficient to measure AutoImm Load in order to inform thetreatment decision.

Example 8 Determination of Appropriate Time to Taper or Stop Therapy foran SLE Patient

The methods of the provided invention can be used to determine theappropriate time to taper or stop therapy for an SLE patient. Inaddition to the time lag that can be exhibited by the clinical measuresof disease activity, a further difficulty lies in the lack ofsensitivity of these measurements. Subclinical disease can nonethelessresult in a re-flaring of the disease if therapy is tapered too early.As a result of this, courses of immunosuppressant therapy are typicallyadministered for a time period that is much longer than is necessary forthe average patient to ensure that the risk of re-flaring is low for theaverage patient yet may still be long enough for the tail end ofdistribution. Therefore significant over-treatment, causing side effectsand costs are occurring in most patients, while under-treatment of somepatients occurs causing potentially preventable re-flares. A method thatcould measure subclinical activity that was predictive of the risk ofre-flaring would allow therapy to be tapered based on such measuresinstead of relying on overtreatment by design.

In this example, the patient from Example 7 is on prednisone andmycophenolate mofetil for a period of 6 months and urine protein tocreatinine ratio returns to a level of 0.5. This level remains above thebaseline level expected in healthy individuals but it is not clear thatthis level is not due to some kidney damage that is not reversible.Other clinical measures of inflammation are normal and the patient doesnot report any other symptoms. At the same time the patient isexperiencing moderate levels of nausea and weight gain as possible sideeffects to the medications that additionally have serious long term sideeffects. The doctor is faced with a difficult decision: balancing thefear of tapering the Cellcept and/or steroid too quickly, which couldresult in renewed kidney inflammation and likely further long termirreversible kidney damage and the adverse reactions that can occur dueto the medications. Here again an unambiguous assessment of the diseasestatus without having to perform a kidney biopsy would play a role inmaking this decision. Attempt of reducing steroids is recommendedthrough repeated trials of steroids leading to the recurrence of thesame clinical dilemma. In fact this question arises at every time thepatient is in remission and the patient is on steroids orimmunomodulators.

AutoImm Load would be very helpful in this situation to assess whetheror not to taper therapy by measuring disease activity either alone or incombination with other markers of disease activity. An algorithm forAutoImm Load will be developed using the study described above. Thecorrelating clonotypes that will be used to calculate AutoImm Load willbe measured using a calibration test. This calibration test will be doneusing blood from a patient at a time of peak disease activity, forexample at the start of therapy. The calibration test could be performedusing blood or alternatively using the tissue that is affected (e.g.kidney biopsy or skin biopsy). At a later time at which the level ofdisease activity is to be assessed, a blood sample can be taken and usedalong with the calibration test to measure AutoImm Load. This will beused to make a treatment decision and to evaluate whether the patienthas any detectable disease activity. If the correlating clonotypes arederived from a populations study, there is no need for the calibrationtest and a blood test at the time at which the response to therapy is tobe assessed is sufficient to measure AutoImm Load in order to inform thetreatment decision.

Example 9 Prediction of Flares in an SLE Patient

One challenge in treating SLE patients is that represented by the factthat flares arise without warning, thus thwarting the physicians'efforts to treat the disease preventively. Waiting for flares to occurbefore beginning treatment subjects patients to potentially destructiveclinical symptoms, can involve expensive and inconvenienthospitalization, and may cause long term organ damage to be done whilealso necessitating aggressive therapeutic interventions that arethemselves fraught with side effects. A much more desirable paradigmwould be a therapeutic paradigm in which flares are detected at asubclinical phase at which time therapy could be administeredproactively saving significant suffering to the patient, resulting inless expensive hospitalizations and ultimately enabling better long termprognosis for the patients.

The patient from Example 7 is recovering from the acute flare describedabove, and the patient is tapered off of all therapies except Plaquiniland a low dose of 5 mg of Prednisone. Nevertheless this patient remainsat a high risk of having another inflammatory episode. As a result, thispatient will remain in the care of a rheumatologist who will continuefollowing patient's clinical symptoms and laboratory tests.Unfortunately these symptoms and tests do not provide early warning foran imminent flare until patients actually have exhibited clinicalsymptoms of a flare and the sequence repeats itself. A highly specificmarker of increasing subclinical activity could be included in theroutine clinical assessment of the patient in order to detectunambiguous signs of a flare which may reach a clinically detectablestage within the subsequent 1-3 months. Beginning therapies earliermight make the flare less severe and may allow treatment to beaccomplished with less long term organ damage or less steroids used thanwhat is currently the case.

AutoImm Load would be very helpful in this situation to assess thelikelihood of an incipient flare by measuring disease activity eitheralone or in combination with other markers of disease activity. Thisscore either by itself or the rate of increase (velocity) oracceleration of this score can be used to assess the likelihood ofprogression to a flare. An algorithm for AutoImm Load could be developedusing the study described above. The correlating clonotypes that will beused to calculate AutoImm Load could be measured using a calibrationtest. This calibration test could be done using blood from a patient ata time of peak disease activity, for example at the start of therapy.The calibration test could be performed using blood or alternativelyusing the tissue that is affected (e.g. kidney biopsy or skin biopsy).At a later time at which the response to therapy is to be assessed, ablood sample can be taken and used along with the calibration test tomeasure AutoImm Load. This can be used to make a treatment decision. Ifthe correlating clonotypes are derived from a populations study, thereis no need for the calibration test and a blood test at the time atwhich the flare risk is to be assessed is sufficient to measure AutoImmLoad in order to inform the treatment decision.

Example 10 Objective Measure to Assess Subjective Symptoms of SLEPatients

SLE affects many organs and produces many potential symptoms includingones that are very common in the healthy populations. For example, if anSLE patient complains of a headache, the headache may be a sign of CNSlupus or can be due to the common headache. Similarly, if SLE patientscomplain of worsening fatigue over a period of time, the worseningfatigue may be due to deterioration of their disease or can be due todepression or other causes. The availability of an objective measurethat reflects disease activity can be of great help in the management ofSLE patients.

The patient in Example 7 presents to the rheumatologist with chiefcomplaints of headache, fatigue, and difficulty with concentration.Patient's headache is recurrent and only transiently gets better withMotrin treatment. The patient's SLE is otherwise in good control.Relevant psychosocial stressors in the patient's life include that sheis going through divorce. Physicians are in a dilemma when they face SLEpatients with symptoms that are non-specific to SLE and are common inthe general population. Is the patient suffering from CNS lupus? Orcould she suffering from other common causes of her symptoms, likedepression? Current laboratory tests currently lack the sensitivity andspecificity to be relied on to distinguish these possibilities. Areliable test to measure SLE disease activity can be utilized routinelyto help in distinguishing the two possibilities.

AutoImm Load would be very helpful in this situation to objectivelyassess the disease activity either alone or in combination with othermarkers of disease activity. An algorithm for AutoImm Load will bedeveloped using the study described above. The correlating clonotypesthat will be used to calculate AutoImm Load will be measured using acalibration test. This calibration test will be done using blood from apatient at a time of peak disease activity, for example at the start oftherapy. The calibration test will be performed using blood oralternatively using the tissue that is affected (e.g. kidney biopsy orskin biopsy). At a later time at which the objective disease activity isto be assessed, a blood sample can be taken and used along with thecalibration test to measure AutoImm Load. This will be used to make atreatment decision. If the correlating clonotypes are derived from apopulations study, there is no need for the calibration test and a bloodtest at the time at which the objective disease activity is to beassessed is sufficient to measure AutoImm Load in order to inform thetreatment decision.

Example 11 Measuring Response to Drug Therapy of an MS Patient

As stated above, one of the principle challenges in MS therapy ismeasuring how well and whether a patient is responding to a drugtherapy. During progressive and late stage disease there are clinicalassessments such as the Expanded Disability Status Score (EDSS) whichmeasure the degree of physical impairment that has resulted from thedisease. However, these assessments are not useful in early stage orrelapsing/remitting disease. Clinical parameters around relapses can beused to assess disease progression, but these are coarse and laggingindicators, as patients can go several years between relapses, duringwhich little evidence can be gleaned from clinical assessments. Lastly,brain imaging such as gadolinium enhanced MRI can be used to examinebrain lesions. MS patients are typically given such an MRI on a yearlybasis. However, such images lack specificity. Furthermore, as a measureon integrated brain damage, they are not good measures of currentdisease activity but rather reflect the history of the disease and itsimpact on the brain.

While it is true that the current clinical treatment paradigm for MS isthat patients diagnosed with relapsing remitting disease should be undercontinuous therapy in order to delay the onset of progressive disease,the increasing repertoire of approved drugs to treat MS makes the lackof biological feedback increasingly problematic. The list shown above ofapproved drugs to treat MS continues to get longer as the substantialinvestment in MS therapies begins to bear fruit. Each of these drugs hasserious side effects and is very expensive to administer, with costsfrom $30,000-$100,000 per year of treatment. Patients that are not wellmanaged will sooner transition to progressive disease which isdebilitating and causes expensive health care interventions includinghospitalizations and long term care. Hence, the patient can be allowedto receive optimal therapy early in treatment.

Clinical Utility Example

Patient profile: A 30 year old female comes to the hospital withmonocular visual impairment with pain. She is given a neurologicalassessment and a lumbar puncture to obtain cerebral spinal fluid whichis used to assess whether clonal T cells are present. She also isreferred for a brain MRI. Based on these tests, a diagnosis of MS ismade. She is prescribed Betaseron 250 mcg per injection to be selfadministered subcutaneously every other day. At a follow-up visit sixmonths later, the patient is complaining of depression and weight gain.No further neurological events have been reported to the physician. Thedoctor is now faced with a clinical dilemma. Should the doctor maintainthe therapy as it is been administered? Should a new therapy be used?Should the doctor order an MRI incurring cost and subjecting the patientto additional contrast exposure? Should the doctor wait until the nextscheduled MRI shows new lesions? Should the doctor wait to see if flaresrecur? All of these decisions would benefit from an unambiguous measureof whether the disease is active or not.

AutoImm Load would be very helpful in this situation to assess theresponse to therapy by measuring disease activity either alone or incombination with other markers of disease activity. An algorithm forAutoImm Load will be developed using the studies described herein. Thecorrelating clonotypes that will be used to calculate AutoImm Load willbe measured using a calibration test. This calibration test will be doneusing blood from a patient at a time of peak disease activity, forexample at the start of therapy. The calibration test could be performedusing blood or alternatively using the tissue that is affected (e.g.CSF). At a later time at which the response to therapy is to beassessed, a blood sample can be taken and used along with thecalibration test to measure AutoImm Load. This can be used to make atreatment decision. If the correlating clonotypes are derived from apopulation study, there is no need for the calibration test and a bloodtest at the time at which the response to therapy is to be assessed issufficient to measure AutoImm Load in order to inform the treatmentdecision.

Example 12 Prediction of MS Flares

As in all autoimmune diseases, the amelioration of flares is a principlegoal of therapy. Not only are flares debilitating for the patient andexpensive to treat, but it is increasingly believed that each flarecontributes to longer term non reversible disease progression. Severaltherapies can be used to control incipient flares such as IVmethylprednisolone or oral prednisone. Such medications have significantside effects and as such are not prescribed without evidence of anactive flare. A measure of increasing subclincal activity that wascorrelated with subsequent clinical flares could be used to inform thissort of proactive flare treatment which could result in shorter and lessdamaging flares. In addition there are therapies that demonstrate highclinical efficacy for reduction of flares that carry risks of verysignificant and lethal of side effects. One such drug is Tysabri, a drugthat has been shown to result both in improved clinical outcomes and toincrease the risk of deadly brain infections such as PML. These riskshave reduced the value of such drugs to last line therapy when otherdrugs are proving to no longer control progression and limited the valueof these drugs as chronic treatments. A test that could predict when theflare state is incipient could increase the utility of such drugs asthey could be used in a manner similar to steroids to control acuteflare periods while minimizing the risks of lethal side effects.

Clinical Utility Example

The patient from Example 11 is on Betaseron for 3 years and reports aclinical flare that lasts a week. The patient's MRI at the end of theyear shows significant new lesions (multiple discrete variable sizedovoid perpendicularly directed T2W and FLAIR hyperintense lesions(plaques), appearing iso-hypointense on T1W images and hyperintense onT2W images involving bilateral periventricular and subcortical whitematter regions, including the calloso-septal interface). The doctor isconcerned that the patient is at high risk of flares over the course ofthe next 12 months. A clinical dilemma presents itself. Does the doctorwait for further clinical symptoms to intervene with additional therapy?Should the doctor switch therapies? If so, should another class ofinjectable be used such as copaxone or should a new class of therapy beused such as Tysabri? Should steroids be prescribed? A test that couldmonitor sub clinical disease activity and show when the disease isincreasing and when a flare is likely to result could be used to helpmake these clinical decisions.

AutoImm Load would be very helpful in this situation to assess the riskof flare by measuring disease activity either alone or in combinationwith other markers of disease activity. An algorithm for AutoImm Loadcould be developed using the studies described in this invention. Thecorrelating clonotypes that will be used to calculate AutoImm Load couldbe measured using a calibration test. This calibration test could bedone using blood from a patient at a time of peak disease activity, forexample at the start of therapy. The calibration test could be performedusing blood or alternatively using the tissue that is affected (e.g.CSF). At a later time at which the risk of flare is to be assessed, ablood sample can be taken and used along with the calibration test tomeasure AutoImm Load. This can be used to make a treatment decision. Ifthe correlating clonotypes are derived from a population study, there isno need for the calibration test and a blood test at the time at whichthe flare risk is to be assessed is sufficient to measure AutoImm Loadin order to inform the treatment decision.

Example 13 Monitoring Therapy Compliance for MS

Because of the relative infrequency of clinical symptoms in the earlystages of the disease, the interactions between a patient and his or herphysician are not very frequent. At the same time, the therapies thatare being prescribed are both expensive and inconvenient for thepatient, involving self injections that can cause painful reaction andside effects. There is as a result a significant degree of noncompliance with therapeutic regimes which are hard for a physician tomonitor as the interactions between the patient and doctor is notroutine. A test that could measure the state of the sub clinical diseasewould allow both doctor and patient to see on a routine basis how wellcontrolled the underlying disease is. Such methods have proved veryeffective in HIV patients in motivating them to pursue therapyeffectively. A test blood test that was performed quarterly would allowthe physician to see the patient and measure the state of the disease.

AutoImm Load would be very helpful in this situation to assess thecompliance with therapy by measuring disease activity either alone or incombination with other markers of disease activity. An algorithm forAutoImm Load will be developed using the studies described herein. Thecorrelating clonotypes that will be used to calculate AutoImm Load willbe measured using a calibration test. This calibration test will be doneusing blood from a patient at a time of peak disease activity, forexample at the start of therapy. The calibration test could be performedusing blood or alternatively using the tissue that is affected (e.g.CSF). At a later time at which the compliance with therapy is to beassessed, a blood sample will be taken and used along with thecalibration test to measure AutoImm Load. This will be used to make atreatment decision and to better guide the patient toward bettercompliance. If the correlating clonotypes are derived from a populationstudy, there is no need for the calibration test and a blood test at thetime at which the compliance with therapy is to be assessed issufficient to measure AutoImm Load in order to inform the treatmentdecision.

Example 14 Amplification of Mouse TCRβ and IgH Sequences

An amplification and sequencing scheme for mouse TCRβ and IgH will bedeveloped that is similar to that developed for humans. Similar methodsto minimize the differences in amplification efficiency of differentsequences and similar validation techniques using spikes and the 5′ RACEtechnique described above will be applied. The minimum input amount ofcDNA will be determined in a similar methodology as described for humansamples. One difference in the amplification scheme between mouse andhumans is that the two C segments for TCRβ in mouse do not have anypolymorphisms in the 50 bp closest to the J/C junction. Therefore, inthe scheme the primer for the first stage amplification will be placedat positions 25-50 and for the second stage amplification the primerwill be placed at positions 1-25, and the primer will have a 5′ tail forthe latter primer containing the P5 sequence. The different sequenceswill improve specificity and is similar to the strategy used in humansexcept there is no need to “loop out” any bases for polymorphisms.

Example 15 Primary Analysis of Mouse Sequence Data

The analysis framework that will be used for analysis of mouse data issimilar to that described above for the human data. One difference isthat the mouse samples will be sequenced to less depth than the humansamples. It is anticipated that the blood samples from the mouse will be100 μl. In 100 μl of blood there are ˜100K lymphocytes and hencesequencing to a depth much higher than 100K does not significantlyimprove the precision. Therefore, only 100K reads for each mouse samplewill be obtained. Even though the number of reads will be smaller formouse than humans, a larger fraction of mouse total and bloodlymphocytes will be sampled. The number of total mouse lymphocytes isexpected to be more than 3 orders of magnitude smaller than that ofhumans. Similarly 100 μl of blood will provide a better sampling (˜10%)of the lymphocytes in the mouse blood at the time when compared tosampling obtained using 10 ml of human blood (0.2%).

Example 16 IgH and TCR Repertoire Analysis in Mouse SLE Model

A mouse model of SLE will be used to study the relationship betweenTCR/BCR repertoire and disease activity. The mouse model will be the B6with the sle1 and sle3 loci from NZM2410. These B6.sle1.sle3 (BSS) micedevelop SLE-like nephritis in a spontaneous fashion. Three types ofcohorts will be studied. For all study points, blood BUN, creatinine,and anti-nuclear autoantibodies, urine protein, and creatinine levelwill be obtained. It will be determined whether a score generated fromblood TCR/BCR repertoire correlates well with these measured indices ofkidney disease. The first cohort will be similar to the human cohortdescribed where longitudinal blood samples will be collected along withkidney function assessment. Specifically, 7 BSS mice will be followed ona monthly basis till month 8. At the end, these mice will be sacrificedand in addition to blood, spleen and kidney tissue are analyzed. As acontrol, 5 B6 mice will be assessed in a similar manner. The secondcohorts will be cross sectional where different cohorts of animals willbe sacrificed at specific times and spleen, kidney, and blood sampleswill be analyzed at that time. Specifically, 5 BSS mice will besacrificed each month and blood, spleen, and kidney will be analyzed. Asa control, two B6 control mice will be assessed in the same fashion.Finally a third cohort will be treated with steroids after disease onsetand nephritis assessment and blood samples obtained on a regular basisafter that. Specifically at 4 months of age, 20 mice that have thedisease will be treated with steroids and then on a biweekly basis forthe next 4 months blood is taken for TCR/BCR repertoire analysis andkidney function assessment. As a control 5 BSS mice will be treated withplacebo and followed in a similar fashion. TCR and BCR repertoireanalysis will be performed from all the study points (i.e. differenttime points and different tissues for the same time point). The analysiswill involve 2 stage PCR, sequencing processing, and primary dataanalysis as described above.

Example 17 Identification and Dynamics of Clonotypes that Correlate withMouse SLE

First, a set of clonotypes that correlate with renal function will beidentified. As a measure of renal function, urine protein/creatinineratio, serum creatinine, or BUN levels can be used. In the first andthird cohorts, the correlation of the blood level of each HPC clonotypewith each of the three measures can be assessed. In a similar manner towhat is described in humans, it can be assessed whether there is a greatincrease in the number of clonotypes with high correlation to 1, 2, orall 3 of the renal function measures over random expectation (orpermutation testing). Given that random expectation, the correlationthreshold will be picked where only 10% of the clonotypes with acorrelation level above that threshold are expected to have the observedcorrelation level by chance (10% false discovery). These clonotypes willbe focused on, and this set will be defined as “correlating clonotypes”.

In addition to this statistical method to identify correlatingclonotypes, clonotypes might be identified relevant to disease by a“functional” method of enrichment of specific clonotypes in kidneytissue. By the functional method a set of clonotypes may be identifiedin cohort 2 that may be relevant to disease, and these will be calledfunctionally-identified correlating clonotypes. The extent of overlapbetween the “statistical” definition and the “functional” definition ofcorrelating clonotypes can be assessed. Cohorts 1 and 3 have kidneysamples collected at the last time point. It can be assessed whetherclonotypes enriched in these kidney samples are present in the blood andare among the clonotypes with higher correlation with renal function.

The dynamics of correlating clonotypes (statistically and functionallyidentified) can then be evaluated. For example, using data from cohort2, the time course of the rise and fall (if any) of their levels will beevaluated in the three compartments: kidney, blood, and spleen.

In the statistically identified correlating clonotypes, a subset of thecorrelating clonotypes would be identified by virtue of theircorrelation with renal function. The correlating clonotypes can beidentified without knowing the renal function data. In other words, thecharacteristics that distinguish the correlating clonotypes from thosethat are irrelevant to disease can be understood. In order to do that aset of clonotypes with low correlation to renal function will beidentified as control non correlating clonotypes.

Characteristics of Clonotypes that Correlate with Disease

After identification of the two sets of clonotypes, correlating and notcorrelating, characteristics that distinguish these two sets will besearched for. Separate and combined analysis using the correlatingclonotypes identified statistically and functionally will be performed.The same type of characteristics studied in humans will be assessed, forexample the level of the clonotype, the presence of particular sequencemotifs, and the sequence of other related clonotypes. As described forthe human study, there is a significant risk of overfitting and hencecross validation technique or separate training and testing sets need tobe employed.

One utility for the mouse experiment is the availability of cellsallowing for assessment of whether correlating clonotypes are enrichedin a specific subtype of cells. It will be studied whether correlatingclonotypes are enriched in some cell subtypes; sequencing from the fullset of lymphocytes and from the specific subtype where correlatingclonotypes are enriched can be done, and this criteria of enrichment canbe used as an extra characteristic to distinguish correlating clonotypesfrom other disease-irrelevant clonotypes. In order to know what cellsubtypes clonotypes are enriched a couple approaches will be taken:hypothesis driven and hypothesis free. The first is to try a dozencandidate surface markers on T or B cells in a set of samples. Forexample, one candidate is CD69 on T cells to select activated T cells.For B cells studies have shown the increase of CD27^(high) cells inactive SLE, and therefore that is a good candidate for a marker of cellsthat may have enrichment of the correlating clonotypes. In each of theseexperiments, the specific cell subtypes is purified through FACS. Then asequencing reaction is done for cDNA from the full complement of thelymphocytes as well as for cDNA from the lymphocytes that were purifiedby FACS from a collection of different samples. It will be assessedwhether the two sets of correlating and non correlating clonotypes arepresent in different proportions in the full complement of lymphocytecompared to the FACS purified subset. Markers that have a largedifference can be useful in identifying correlating clonotypes.Enrichment of clonotypes in subtypes of cells with these markers will beused in addition to the sequence parameters to detect correlatingclonotypes.

In the hypothesis free approach, markers will be searched for which aredifferentially expressed in cells with a correlating clonotype fromother cells. A few cases will be chosen where a specific TCR clonotypeis clearly correlating with disease, and cases will be picked where thatclonotype is highly enriched that it represents the majority of theclonotypes with the same V segment. FACS will be done using antibody tothe specific V segment (antibodies against all V segments arecommercially available) to select a population that is highly enrichedfor cells carrying the correlating clonotype. The RNA can be preparedfrom these cells and the expression of all the genes can be studied byperforming an array experiment. As a control, total RNA from lymphocytescan be used and/or RNA from FACS purified cells carrying anotherirrelevant V segment. Markers that maximally distinguish the sampleobtained from the FACS purified V segment with the correlating clonotypefrom the controls can be searched for. Markers, including surfacemarkers (since it is much easier to do FACS with surface proteins) thatdistinguish the two populations can be found. If a consistent RNA markerfrom samples of several mice is observed it will be validated at theprotein level. Using the same samples, antibodies against the markerprotein will be used in a FACS assay to purify cells carrying the markerprotein. More than one marker may be tested to increase the chance ofvalidating one of them. The TCR and/or BCR from the purified cells willbe sequenced. If the RNA results hold at the protein level then thecorrelating clonotypes should be enriched in the purified subset ofcells. After validating that RNA results still hold at the proteinlevel, the results will be validated in other samples. Samples that werenot subject to the array analysis will be subjected to FACS analysisusing the antibody to the marker protein(s). The TCR and/or BCR of thepurified cells will be sequenced. It will be evaluated whether thecorrelating clonotypes are enriched in the cells purified using antibodyto the specific marker(s). This will validate the utility of themarker(s) in the identification of correlating clonotypes.

Example 18 Use of IgH and TCRβ Repertoire to Measure Disease Activity

The algorithm for correlating clonotypes from above can be applied toidentify in all samples of cohorts 1 and 3 correlating clonotypes byvirtue of their sequence and/or markers. Using the level of thecorrelating clonotypes in each patient, an AI score can be generatedthat correlates with a measure of renal function. As described above,there is an overfitting risk and the cross validation technique and/orseparate training and testing set need to be employed. The correlationof AI and renal function measures can be evaluated in a cross sectionalmanner (all study points of all mice). The question of whether the AIscore changes in an individual mouse can also be evaluated when renalfunction changes. This can be evaluated by comparing the AI from highand low renal function in the same animal in a similar manner to what isdescribed in humans.

Example 19 Linking of Sequences from the Same Cell

Two sequences can be amplified from the same cell, and duringamplification they can be linked to form one amplicon. Information onthe presence of these two sequences in the same cell can then bepreserved even if the linked sequences are mixed with a pool ofsequences from other samples.

An example of the utility of this linking scheme is for assessment ofthe diversity of TCRs. The diversity of TCR is generated from thediversity of each of TCRα and TCRβ. In addition, the combination of aTCRα and TCRβ in a cell adds significantly to the diversity. However,when nucleic acids are extracted from a sample with a plurality ofT-cells, the information of which TCRα is present in the same cell asTCRβ is lost. A method that allows the preservation of this informationis presented here. This method comprises separating the cells indistinct compartments, amplifying the desired sequence in a way thatcovalently links initially separate amplicons, and optionally mixing allthe amplified sequences for later analysis. Several methods can beconceived to place each cell in a compartment. For example, one methodis to put cells in a microdroplet or a micelle emulsion that can be usedin PCR. These droplets can be filled in a directed manner or randomlyfilled in such a way that most droplets contain at most a single cell.Also, cell sorting can be used to place a single cell in a PCRcontainer. Amplification of nucleic acid can then be performed in eachdroplet.

Scheme 1

As illustrated in FIG. 9, sequence 1 can be amplified using primer 1 andprimer 2. Primer 2 carries a 5′ overhang sequence that is notcomplementary to the genomic sequence (FIG. 9A, thin line). Similarlysequence 2 can be amplified using primer 3 and primer 4 (FIG. 9A).Primer 3 carries a 5′ overhang sequence that is complementary to theoverhang sequence of primer 2 (FIG. 9A, line dashed line). In thisfigure the two overhangs (or the two linking sequences) representing twocomplementary sequences are drawn in thin lines; one sequence is shownas a solid line and its complement is shown as a dashed line. Othercomplementary sequences are drawn to have the same solid colors: black,and grey for sequence 1 and 2, respectively.

After amplification with primers 1-4, each of the two amplificationproducts has a linking sequences on one end and the two products cananneal to each other and strands can be extended to form a full doublestranded molecule (FIG. 9B). This molecule now has sequence 1 and 2linked to each other and can then be amplified with primers 1 and 4(FIG. 9C).

All 4 primers can be put in the reaction at the same time to achievesequence linking and amplification. It may be beneficial to add lowconcentration of primers 2 and 3. The low concentration of primer 2 and3 will ensure that the two individual sequence amplicons will reachsaturation early in the reaction allowing the linked amplicon todominate the PCR reaction in the latter stages of the reaction. Thiswill lead to the final reaction having a high concentration of thelinked amplicon relative to the individual sequence amplicons.

Scheme 1(a)

Scheme 1(a) is a variant of Scheme 1 in which the linking sequence isidentical to the primer 2 sequence (FIG. 10). Sequence 1 can beamplified using primer 1 and primer 2 with no overhanging sequences onthe primers (FIG. 10A). Primer 3 carries a 5′ overhang sequence that iscomplementary to primer 2 (FIG. 10A). Sequence 2 can be amplified usingprimer 3 and primer 4 creating a linking sequence that is complementaryto Sequence 1 (FIG. 10A). Other complementary sequences are drawn tohave the same colors: black, and grey for sequence 1 and 2,respectively.

After amplification with primers 1-4, the two products can anneal toeach other via the Primer 2 sequence and strands can be extended to forma full double stranded molecule (FIG. 10B). This molecule now hassequence 1 and 2 linked to each other and can then be amplified withprimer 1 and 4 (FIG. 10C).

Scheme 2

Scheme 2, shown in FIG. 11 is a similar scheme to scheme 1 except thatthe ultimate amplification is achieved with sequences that are notcomplementary to the genome. One advantage of this approach is that thepriming sequences can be chosen to be ideal for amplification with nooff target amplification. This can be helpful in cases where primerscomplementary to the genomic sequence to be amplified are not ideal. Byusing primers not complementary to the genome for amplification, lowconcentration of primers 1-4 can be used, minimizing off targetamplification. Also, scheme 2 can be adapted to a multiplexing scheme inwhich more than a pair of primers is used without causing as many primerinteractions. Each pair of sequences to be linked will have its ownunique 4 primers that need not be at high concentration. One pair ofamplification primers can amplify all the pairs of linked sequences(FIG. 11C).

Sequence 1 can be amplified using primer 1 and primer 2 (FIG. 11A).Primer 1 and 2 carry on their 5′ ends distinct overhang sequences thatare not complementary to the genomic sequence (FIG. 11A, dotted and thinlines, respectively). Similarly, sequence 2 can be amplified usingprimer 3 and primer 4 (FIG. 11A). Primer 3 and 4 carry on their 5′ endsdistinct overhang sequences that are not complementary to the genomicsequence (FIG. 11A). The overhang on primers 1 and 4 are labeled as “Amp1” (dotted) and “Amp 2” (wavy) and are sequences not complementary tothe genome ultimately used for amplification (FIG. 11A). Analogously toscheme 1, the overhangs of primer 2 (thin) and 3 (thin/dashed) are thelinking sequences that are complementary to each other. Othercomplementary sequences are drawn to have the same colors: black andgray for sequence 1 and sequence 2, respectively.

After amplification with primers 1-4, each of the two amplificationproducts has a linking sequences on one end and the two products cananneal to each other and strands can be extended to form a full doublestranded molecule (FIG. 11B). This molecule now has sequence 1 and 2linked to each other and can then be amplified with primer 5 and 6 (FIG.11C).

Optionally, primers 1-4 can initially be used, and after the linking ofthe two sequences, primer 5 and 6 can be added. A more preferredembodiment will have all the primers added in the first step. Yet a morepreferred embodiment will have all the primers present initially withthe concentration of primers 1-4 lower than that of 5 and 6. This allowsthe full linking and amplification to occur in one step. The lowconcentration of primers 2 and 3 will ensure that the two individualsequence amplicons will reach saturation early in the reaction allowingthe linked amplicon to dominate the PCR in the latter stages of thereaction. This will lead to the final reaction having a highconcentration of the linked amplicon relative to the individual sequenceamplicons. Furthermore, the low concentration of primers 1-4 minimizesany off target amplification that can occur if these primers were lowerquality than primers 5 and 6.

The use of primers 5 and 6 for amplification enables more efficientmultiplexing (FIG. 12). One pair of primers (primers 5 and 6) can beused to amplify all the linked sequences. The linking sequences can bedesigned in different ways for different applications. The exampleillustrated in FIG. 12 is for two pairs of sequences to be linked, butthis scheme can be extended further to 10's, 100's, or 1000's ofsequences. If there is a set of gene pairs to be linked (e.g., TCRα withTCRβ and IgH with IgK) then the linking sequences for each pair can bedifferent. In this example linking sequencing for TCRα and β will bedifferent from those of IgH and IgK as depicted by thick dashed lines(TCRα and TCRβ) or thin dashed lines (IgH with IgK) (FIG. 12A). All theamplified sequences in this example are shown in the same color. Theamplification primers for the all the linked sequences will be the sameprimers: 5 and 6 as depicted in FIG. 11. In other applications the samelinking sequences can be used if there is no specific pairing.

It is also conceived that more than 2 sequences can be linked. Forexample 3 or more sequences can be linked together (FIGS. 13A-13D). Tocreate a molecule that links 3 sequences, one of the products can havetwo different linking sequences on its ends, each linking with oneproduct (FIG. 13A). In the depicted example, sequence 2 has two linkingsequences. The linking sequence of primer 3 allows the linking tosequence 1 through the linking sequence of primer 2 (linking sequencecomplementary pair LS1). Similarly, the linking sequence of primer 4allows the linking to sequence 3 through the linking sequence of primer5 (linking sequence complementary pair 2, LS2) (FIG. 13A). In anothercycle the whole of sequence 2 becomes a linking sequence to linksequence 1 and sequence 3. The Amp1 and Amp 2 sequences complementary toprimers 1 and 6 enable amplification after formation of a molecule withlinked sequences 1-3.

Example 20 Monitoring for Metastatic Recurrence in Colon Cancer Patients

Many cancers that are detected at a treatable stage still carry anongoing risk to the patient of metastatic tumor recurrence. Suchrecurrences are often detected late and at untreatable stages an can befatal to the patients. One example of such a situation is that ofrecurrent colon cancer. Despite increasingly aggressive colon cancerscreening programs, colon cancer represents one of the most commonmalignancies in the US. Approximately 150,000 patients per year arediagnosed with colon cancer at serious but treatable stages (Stage IIand Stage III). These patients are treated by tumor resection followedby a course of chemotherapy. While these treatments are generallyeffective, there is nonetheless a significant chance that these patientswill have metastatic recurrences of the primary tumor in the yearsfollowing treatment. 50% of Stage III patients for instance will have arecurrence within 5 years of surgery. These recurrences can be eitherisolated (e.g. in the colon or liver) or multifocal. In either case butparticularly if they are isolated, detecting them at an early stage canplay a role in maximizing the chances of successful therapy (surgeryand/or chemotherapy).

There are currently two tests used in post treatment surveillance. CTscan of the abdomen and chest is used to identify tumors visible onthese images. Typically these scans are done at intervals of 6-12 monthsfor the first 5 years post therapy. While these scans can reveal earlystage malignancies, there clinical effectiveness is in debate. Drawbacksof these scans include the fact that they subject the patients tosignificant amounts of radiation which can itself cause further tumorsand the significant expense. Another blood based test has been shown tohave some value: CEA testing. This antibody test measures the level of aprotein in serum that is specific to some colon tumors. The drawback toCEA testing is its lack of sensitivity (<60% of patients with positiveCT scans have a positive CEA test).

In this embodiment of the invention, lymphocytes obtained from theresected primary tumor are used to develop an immune profile that can beused to add sensitivity to a blood based test for early cancerrecurrence TCRs (and/or BCRs) of the lymphocytes found in the resectedtumor can be amplified and sequenced. Clonotypes that are enriched inthe tumor sample are likely relevant to the immune response to thetumor. Subsequent blood draws from the patient can be used to assess thelevel of these clonotypes. A rise in the level of these clonotypes cansignal an immune response to a tumor recurrence. In this case thedetection of the immune response may be more sensitive than thedetection of the tumor marker itself.

Discovery Study for the Detection of Cancer Recurrence Using aCalibration Test

It is conceived that a discovery study can be performed to determine thelikelihood of detection of recurrence given the profile of blood TCR(and/or BCR). Samples of resected tumor samples as well as follow upblood samples of patients with known outcome can be used for this study.TCR (and/or BCR) from all these samples can be sequenced. Candidates forthe correlating clonotypes are those that are present in the TCR (and/orBCR) data from the tumor samples Given the known outcomes in thistraining study one can devise using the standard cross validationtechniques, a model that generates a score (Recurrence Risk) given thelevel of the different clonotypes. This Recurrence score can thus becalculated in a new patient by measuring the clonotypes in the resectedtumor (calibration point) and the data from the clonotypes found in thesame patient's blood at a later time during the surveillance forrecurrence. The use of the tumor data allows great reduction in thenumber of clonotypes present in blood that are considered in thisanalysis.

Discovery Study for the Detection of Cancer Recurrence Using aCalibration Test and a Population Study

It is likely that not all clonotypes that are enriched in the tumorspecimen are relevant to the immune response to the tumor. There mightbe some lymphocyte that expanded locally due to a favorable inflammatorycondition. In another embodiment of this invention the discovery studycan be done using the same samples but the study is used to identifyparameters that distinguish “correlating” from “non correlating”clonotypes. These parameters can include 1) Sequence motif: The motifcan be a specific V or J region, a combination VJ, or short sequences inDJ region that is associated with a clonotype being correlating; 2) Sizeof the clonotype; 3) Level: Absolute level (number of reads per million)or rank level; 4) Similarity to other clonotypes: the presence of otherhighly related clonotypes, like those with silent changes (nucleotidedifferences that code for same amino acids) or those with conservativeamino acid changes; 5) For the BCRs the level of somatic mutations inthe clonotype and/or the number of distinct clonotypes that differ bysomatic mutations from some germline clonotype. 6) Presence in a cellcarrying a specific marker. This study will then result in an algorithmthat can predict which clonotypes are likely to be correlating withcancer recurrence in blood given a specific set of clonotypes present ina given tumor sample. These clonotypes can then be used to develop ascore of Recurrence Risk in the same manner as described above.

Discovery Study for the Detection of Cancer Recurrence Using aPopulation Study

In another embodiment of this invention, the clonotypes measured in theresected tumor are used to generate a model that predicts correlatingclonotypes in as yet unseen samples. This model can also be used togenerate a Recurrence Risk score in a manner analogous to that describedabove. In this model there would be no need to measure the clonotypes inthe resected cancer tissue in a new patient undergoing recurrencesurveillance but rather the Recurrence Risk could be assessed by simplymeasuring the clonotypes in a given blood sample.

Discovery Study for the Detection of Primary Colon Cancer Using aPopulation Study

As an extension it is conceived that detection of primary cancers can beachieved using the same methodology. With the primary cancers there isno tumor resected that can be used to enrich for relevant clonotypes.However, even in the presence of tumor resection data it is conceivedthat additional sequence and other parameters need to be used toidentify relevant clonotypes and ultimately generate a score forlikelihood of cancer detection. Therefore by extension if the algorithmis predictive enough one is able to detect the cancer from blood (orother bodily fluid) without the data from the resected tumor. In thisembodiment of the invention, a discovery study with blood samples frompatients preceding their diagnosis of primary cancer need to beavailable. In an analogous fashion to the one described above,parameters (sequence and other) can be identified to predict theclonotypes that are correlated to the immune system response to thetumor. A model can then be used to generate a Cancer Risk score thatpredicts the progression risk to colon cancer. This algorithm can thenbe applied to new patient's blood sample to measure the risk of primarycolon cancer.

Example 21 Monitoring for Rejection in Heart Transplant Patients

Heart transplants are a relatively uncommon procedure as the supply oforgans is very limited. 3,500 heart transplants performed every yearworldwide. Each procedure is very expensive and the organs that are usedare priceless. As a result the patients that receive these organs aretreated extremely proactively. In order to measure the state of theimmune reaction to the donated organ at a time at which interventionswith immunosuppressants can be effective, patients are given periodicheart biopsies to measure inflammation of the organ. Based on thesetests, aggressive courses of immunosuppressants may be given. Theseprocedures have several limitations. As invasive surgical proceduresthey have risks to the patient. Furthermore they are expensive and canonly be done at infrequent intervals. A blood based tests based onprofiling the expression of a panel of 11 test genes (Allomap) have beenshown to be quite sensitive in detecting organ rejection but lackssufficient sensitivity to be used as a replacement for biopsy and isinstead used to decide when to do a biopsy. In one embodiment of thisinvention TCR (and/or BCR) profiles are used to assess the state of“rejection” and generate a Rejection Risk score that predicts thelikelihood of rejection in a specific time frame. It is conceived that adiscovery study can be performed to determine the likelihood ofrejection given the profile of blood TCR (and/or BCR). This can be usedin the clinic to inform the immunosuppressive therapies that are beingused.

Discovery of Correlating Clonotypes Using a Population Study

In this embodiment of the invention a population of post transplantpatients with blood samples with known clinical outcome can be used. TCR(and/or BCR) from all these samples can be sequenced and correlation ofindividual clonotypes with rejection outcome can be used to distinguishcorrelating from non-correlating clonotypes. Subsequently, parameterscan be derived that distinguish those two classes of clonotypes. Theseparameters can include 1) Sequence motif: The motif can be a specific Vor J region, a combination VJ, or short sequences in DJ region that isassociated with a clonotype being correlating; 2) Size of the clonotype;3) Level: Absolute level (number of reads per million) or rank level; 4)Similarity to other clonotypes: the presence of other highly relatedclonotypes, like those with silent changes (nucleotide differences thatcode for same amino acids) or those with conservative amino acidchanges; 5) For the BCRs the level of somatic mutations in the clonotypeand/or the number of distinct clonotypes that differ by somaticmutations from some germline clonotype. 6) Presence in a cell carrying aspecific marker. An alternative or supplemental method to define thecorrelating and non-correlating clonotype would come if the studysamples have biopsy samples of the graft, particularly if it was inactive rejection. It is expected that at that time there will be greatenrichment of the correlating clonotypes. Parameters to distinguishthese from the other clonotypes can be identified as discussed above.

The profile data from the blood samples is then used to predict thelikelihood of rejection. Given the known outcomes in this training studyone can devise, a model using the standard cross validation techniquesthat generates a Rejection Risk score given the level of the differentclonotypes. Given the profile in a new blood sample of TCR (and/or BCR)at a specific point a Rejection Risk score relating to the likelihood ofrejection can be generated

Discovery of Correlating Clonotypes Using a Calibration Test

In another embodiment a method of identifying correlating clonotypes canbe implemented using a calibration test for each patient. This methodinvolves a first biopsy sample be taken post transplant. The presence ofbiopsy material of the graft post transplant offers the possibility ofanalyzing TCRs from the biopsy sample to identify the correlatingclonotypes as defined by those that are prevalent in this sample. Thisset of clonotypes can then be followed in blood and a score is generatedfor the likelihood of rejection. The algorithm to generate the RejectionRisk score is derived through a discovery study that is similar to theone described above that utilizes the available clinical data and thelevels of the correlating clonotypes to generate a Rejection Risk scorethat approximates the likelihood of rejection.

In this embodiment a specific calibration test will be done usingmaterial from a first biopsy post transplant but further biopsies couldbe replaced by the use of blood samples whose clonotypes could be usedalong with this calibration test to measure a Rejection Risk score.

In addition to the graft biopsy, one can use the blood samples beforetransplant as another calibration point. Clonotypes that are prevalentin this sample are unlikely to be related to the rejection representingrather the history of prior antigens the patient has seen. Thereforewhen considering the blood samples after transplant one can subtract theclonotypes that were present before the transplant in determining thecorrelating clonotypes. These clonotypes can then be used to generate amodel of Rejection Risk.

In this embodiment, two calibration tests would be can be used: oneprior to transplant and one from a biopsy after transplant. Thesecalibrations could then be used along with clonotypes derived from ablood test to measure Rejection Risk.

Discovery of Correlating Clonotypes Using a Calibration Test and aPopulation Study

In another embodiment, the identification of the correlating clonotypescan be achieved through a combination of the above approaches.Specifically this can be achieved by using the population study togenerate an algorithm to predict correlating clonotypes. In addition itcan be achieved through calibration data from the same patient usinggraft biopsy and/or blood samples pre-transplant. A more preferredembodiment will employ both approaches: population-built algorithm andindividual calibration to most accurately identify the correlatingclonotypes. A Rejection Risk score is then generated using the level ofthese clonotypes to predict the likelihood of rejection through the useof the population study as a training set.

In this embodiment, two calibration tests can be used: one prior totransplant and one from a biopsy after transplant. These calibrationscould then be used along with clonotypes derived from a blood test tomeasure Rejection Risk.

The prediction of GVHD can be done in a very similar manner with thesame concept of the population study to generate an algorithm to predictcorrelating clonotypes. Also the “negative” calibration can be generatedfrom the donor sample pre-transplantation. An approach using both thealgorithm and calibration is likely to be more predictive of thecorrelating clonotypes. An algorithm to compute a score of thelikelihood of GVHD given the level of the correlating clonotypes can begenerated using a population study in a manner as described above. Thisalgorithm can then be used for the prediction of the likelihood of GVHDin the next set of patients.

Example 22 Monitoring for PML Infection in MS Patients Treated withNatalizumab

One embodiment of the invention uses TCR and/or BCR profile to detectsubclinical Progressive Multifocal Leukoencephalopathy (PML) in MSpatients. PML is a serious and often fatal disease that causes oftenrapidly progressive demyelinating disease through killingoligodendrocytes that synthesize myelin. It is caused by JC virus thatis present in a latent phase in the majority of the population. In afraction of the immunosuppressed population (e.g., AIDS) the virus isreactivated leading to the development of this serious disease. Inaddition some patients who are being immunosuppressed through the use ofmedication like post transplant patients can also develop PML. Somespecific medication has been linked to the risk of PML in specificpatient populations. For example natalizumab (Tysabri) was associatedwith the development of more than 10 cases of PML among patients withmultiple sclerosis (MS) leading to its withdrawal of the market for aperiod of time. Natalizumab is well accepted to be more effective thanthe other FDA approved medications for multiple sclerosis, but its usehas been limited by the fear of PML development. Once PML is suspected,plasmapheresis can be performed to reduce the concentration of the drugin the patient. The overlap between symptoms of MS and PML can sometimesdelay the detection of PML. Early detection of subclinical PML isurgently needed.

These clonotypes may be discerned from blood samples from a populationwhere some patients developed PML. This population can be used toidentify clonotypes that correlate with the later development of PML.With the availability of these clonotypes an algorithm to identifyparameters that distinguish these from other clonotypes can begenerated.

Discovery of Correlating Clonotypes Using a Population Study

In this case an algorithm is generated to predict the clonotypes thatare relevant to the emergence of PML. The algorithm can be trained on aset of clonotypes deemed to be correlating with the disease. In thisembodiment of the invention blood (or other body fluid) samples in adiscovery study from a population of patients with a latent infectionwith JC virus some of whom go on to develop PML can be used. TCR (and/orBCR) from all these samples can be sequenced and correlation ofindividual clonotypes with infectious agent reactivation outcome can beused to distinguish correlating from non-correlating clonotypes.Parameters that distinguish those two classes of clonotypes can beidentified. These parameters can include 1) Sequence motif: The motifcan be a specific V or J region, a combination VJ, or short sequences inDJ region that is associated with a clonotype being correlating; 2) Sizeof the clonotype; 3) Level: Absolute level (number of reads per million)or rank level; 4) Similarity to other clonotypes: the presence of otherhighly related clonotypes, like those with silent changes (nucleotidedifferences that code for same amino acids) or those with conservativeamino acid changes; 5) For the BCRs the level of somatic mutations inthe clonotype and/or the number of distinct clonotypes that differ bysomatic mutations from some germline clonotype. 6) Presence in a cellcarrying a specific marker. An alternative or supplemental method todefine the correlating and non-correlating clonotype would come from aset of patients who are mounting an immune response to the sameinfectious agent. Enriched clonotypes (particularly those that are at asignificantly higher level than before the immune response) in thesepatients can be considered correlating and parameters that distinguishthem from other clonotypes can be identified.

Similarly the correlating clonotypes can be identified from samples ofpatients with active PML or from in vitro studies to identify clonotypesthat respond to JC virus antigen. The responding clonotypes mayoriginate from one or a plurality of subjects that may be healthy orinfected with the infectious agent. These clonotypes can be consideredcorrelating and parameters that distinguish them from other clonotypescan be identified.

The profile data from the samples in the discovery study is then used topredict the likelihood of reactivation. Given the known outcomes in thistraining study one can devise using the standard cross validationtechniques, a model that generates a PML Risk score given the level ofthe different clonotypes. So given the profile in a blood sample of TCR(and/or BCR) at a specific point a score relating to the likelihood ofreactivation can be generated. This algorithm can now be used with datafrom a novel patient to predict the patient's correlating clonotypes aswell as to generate a PML Risk score for the likelihood of reactivation.

In a very similar manner other infection-related outcomes can bestudied. For example in addition to reactivation of latent infection,one can assess clearance of infection. Furthermore given the TCR and/orBCR repertoire one may be able to evaluate likelihood of having immunityfor a specific infectious agent.

Example 23 Monitoring for Reactivation of Latent Infections

In another embodiment TCR and BCR profiling can be used to monitorinfections that have periods of acute infection followed by latency andreactivation. Examples of such diseases include Hepatitis B and C aswell as Herpes viruses. Predicting infections at early stage would bedesirable.

Discovery of Correlating Clonotypes Using a Calibration Test In anotherembodiment a method of identifying correlating clonotypes can beimplemented using a calibration test for each patient. The presence of abiological sample from the same patient at a previous time point whenthe patient was mounting an immune response to the infectious agent canserve to identify the correlating clonotypes. This set of clonotypes canthen be followed in blood and a Reactivation Risk score is generated forthe likelihood of reactivation. The algorithm to generate the score isderived through a discovery study that is similar to the one describedabove that utilizes the available clinical data and the counts of thecorrelating clonotypes to generate a Reactivation Risk score thatapproximates the likelihood of reactivation. To use this score a sampletaken from a new patient in clinical practice during a period of acuteinfection. This data would be used along with a subsequent sample takenduring the latent period to measure the Reactivation Risk for clinicalpurposes.

Discovery of Correlating Clonotypes Using a Calibration Test and aPopulation Study

In another embodiment, the identification of the correlating clonotypescan be achieved through a combination of the above approaches.Specifically this can be achieved by using the population study togenerate an algorithm to predict correlating clonotypes. The correlatingclonotypes can be obtained from a population study of patients withknown outcome of the infection and/or a set of patients with activeimmune response to the infectious agent, and/or from in vitroexperiments to identify clonotypes reactive with the infectious agent.In addition it can be achieved through calibration data from the samepatient using older data points at the time of an active immune responseagainst the relevant infectious agent. A more preferred embodiment willemploy both approaches: population-built algorithm and individualcalibration to most accurately identify the correlating clonotypes. AReactivation Risk score is then generated using the level of theseclonotypes to predict the likelihood of reactivation through the use ofthe population study as a training set. To use this score a sample takenfrom a new patient in the clinic during a period of acute infection isprofiled. This data would be used along with a subsequent sample takenduring the latent period to measure the Reactivation Risk for clinicalpurposes. A similar structure can be employed to study infectious agentclearance and or immunity to it.

Example 24 Monitoring for Allergic Response During Immunotherapy

Allergic rhinitis is a common condition afflicting ˜11% of the USpopulation. This is typically an allergy to pollen or dust. Eliminatingthe exposure is difficult and it involves vigilant effort. The mostcommon treatments used in chronic rhinitis are decongestants,antihistamines, and nasal steroids. In severe cases immunotherapy isdone. The goal of the immunotherapy is to de-sensitize the patient.First a challenge with many potential allergens is done to identify thespecific allergen the patient is reacting to. Then the patient isinjected with increasing amount of allergen over a period of months toyears until a maintenance dose is achieved, and the treatment is thencontinued for several years. Typically the patient can feel animprovement in symptoms within 3-6 months, but that can also be as lateas 12-18 months, but a large fraction of the patients do not benefitfrom the treatment or have relapses. One reason for the slow doseescalation is the risk of anaphylaxis if the patient is given a highdose of allergen before s/he sufficiently de-sensitized.

In one embodiment of this invention TCR (and/or BCR) profiles are usedto assess the state of disease in allergic rhinitis and generate anAllergy Score that predicts how prone the patient to mount an allergicresponse should s/he be exposed to the relevant allergen. It isconceived that a discovery study can be performed to determine thelikelihood of allergy response given the profile of blood TCR (and/orBCR). This can be used in tailoring the immunotherapy treatment.Possible clinical decision can be to discontinue the treatment if it isdeemed ineffective, continue the injection regimen, or accelerate thetreatment to reach the maintenance dose faster.

Discovery of Correlating Clonotypes Using a Population Study

In this embodiment of the invention a population of allergic rhinitispatients on immunotherapy with blood samples with known clinical outcomecan be used. TCR (and/or BCR) from all these samples can be sequencedand correlation of individual clonotypes with allergy outcome can beused to distinguish correlating from non-correlating clonotypes.Subsequently, parameters can be derived that distinguish those twoclasses of clonotypes. These parameters can include 1) Sequence motif:The motif can be a specific V or J region, a combination VJ, or shortsequences in DJ region that is associated with a clonotype beingcorrelating; 2) Size of the clonotype; 3) Level: Absolute level (numberof reads per million) or rank level; 4) Similarity to other clonotypes:the presence of other highly related clonotypes, like those with silentchanges (nucleotide differences that code for same amino acids) or thosewith conservative amino acid changes; 5) For the BCRs the level ofsomatic mutations in the clonotype and/or the number of distinctclonotypes that differ by somatic mutations from some germlineclonotype. 6) Presence in a cell carrying a specific marker. Analternative or supplemental method to define the correlating andnon-correlating clonotype would use biopsy of positive allergy testmaterial from patients positive for a specific allergen. At the site ofinjection of the allergen it is expected that there will be greatenrichment of the correlating clonotypes. Parameters to distinguishthese from the other clonotypes can be identified as discussedpreviously.

The profile data from the blood samples is then used to predict theallergy state. Given the known outcomes in this training study one candevise, a model using the standard cross validation techniques thatgenerates an Allergy Score given the level of the different clonotypes.Given the profile in a new blood sample of TCR (and/or BCR) at aspecific point, an Allergy Score can be generated to estimate the degreeto which this patient is prone to mount an allergic response.

Discovery of Correlating Clonotypes Using a Calibration Test

In another embodiment a method of identifying correlating clonotypes canbe implemented using a calibration test for each patient. This methodinvolves a biopsy sample from a site with a positive allergen responsebe taken from the patient. This can be from the initial allergy testthat was performed to determine the specific allergen the patient isresponding to or sample from the site of any further treatmentinjections. This can be done more than once to ensure that theappropriate clonotypes are being followed in case there is some epitopespreading. TCR and/or BCR from these biopsy samples can be used toidentify the correlating clonotypes as defined by those that areprevalent in this sample. This set of clonotypes can then be followed inblood and a score is generated for the likelihood of allergy response.The algorithm to generate the Allergy Score is derived through adiscovery study that is similar to the one described above that utilizesthe available clinical data and the levels of the correlating clonotypesto generate an Allergy Score that estimates the allergy state.

Discovery of Correlating Clonotypes Using a Calibration Test and aPopulation Study

In another embodiment, the identification of the correlating clonotypescan be achieved through a combination of the above approaches.Specifically this can be achieved by using the population study togenerate an algorithm to predict correlating clonotypes. In addition itcan be achieved through calibration data from the same patient usingbiopsy from a site with a positive allergen response. A more preferredembodiment will employ both approaches: population-built algorithm andindividual calibration to most accurately identify the correlatingclonotypes. An Allergy Score is then generated using the level of theseclonotypes to predict the state of allergy through the use of thepopulation study as a training set.

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

1-34. (canceled)
 35. The method of claim 65, wherein said disease is an autoimmune disease and said one or more correlating clonotypes are present in a peak state of the disease, wherein the peak state of the disease is a flare state of the autoimmune disease. 36-39. (canceled)
 40. The method of claim 65, wherein said T-cells and/or B-cells comprise a subset of T-cells and/or B cells.
 41. The method of claim 40, wherein said subset of T-cells and/or B-cells are enriched by interaction with a marker.
 42. The method of claim 41, wherein said marker is a cell surface marker on the subset of T-cells and/or B-cells.
 43. The method of claim 40, wherein said subset of T-cells and/or B-cells interact with an antigen specifically present in the disease.
 44. The method of claim 35, wherein the disease is systemic lupus erythematosus or multiple sclerosis. 45-64. (canceled)
 65. A method for monitoring an autoimmune disease, an infectious disease or a cancer of an individual, the method comprising: (a) obtaining a sample from the individual comprising T-cells and/or B-cells; (b) spatially isolating individual molecules of nucleic acids from said cells of the sample, the individual molecules of nucleic acid comprising sequences of complementary determining region 3 (CDR3) from T-cell receptor genes or immunoglobulin genes; (c) sequencing the spatially isolated individual molecules of the nucleic acids to provide a full repertoire of CDR3 sequences; and (d) monitoring the autoimmune disease, infectious disease or cancer by determining levels of correlating clonotypes in the full repertoire of CDR3 sequences from the sample.
 66. The method of claim 65 wherein each of said full repertoire of CDR3 sequences comprises at least 1000 sequence reads each comprising at least 30 bp.
 67. The method of claim 66 further including a step of amplifying said individual molecules of nucleic acid prior to said step of spatially isolating said individual molecules.
 68. The method of claim 67 wherein said CDR3 sequences each include a V segment and wherein said step of amplifying includes amplifying in a polymerase chain reaction using primers specific for each of the V segments.
 69. The method of claim 66 wherein said one or more correlating clonotypes are distinguished among said full repertoire of CDR3 sequences by V, D, and J segments used.
 70. The method of claim 66 wherein said sequences of CDR3 are from said immunoglobulin genes and have a level of somatic mutations and wherein said one or more correlating clonotypes are distinguished among said full repertoire of CDR3 sequences by the level of somatic mutations in each of such clonotypes.
 71. The method of claim 66 wherein said sequences of CDR3 each have a level within said full repertoire of CDR3 sequences and wherein said one or more correlating clonotypes are distinguished among said full repertoire of CDR3 sequences by the level of each of such clonotypes.
 72. The method of claim 66 wherein said monitoring is for a recurrence of a cancer and wherein said correlating clonotypes are from an immune response to the cancer.
 73. The method of claim 65 wherein each of said samples comprises at least 10,000 B-cells and/or T-cells.
 74. The method of claim 72 wherein said sample is blood.
 75. The method of claim 65 wherein said sample comprises cell-free DNA or RNA.
 76. The method of claim 65 wherein said sample comprises at least 10⁵ B-cells or at least 10⁵ T-cells and wherein said step of sequencing comprises at least 10⁶ reads per run.
 77. A method for monitoring an autoimmune disease, an infectious disease or a cancer of an individual, the method comprising: (a) obtaining a sample of nucleic acids from T-cells and/or B-cells of an individual; (b) spatially isolating individual molecules of nucleic acids from said cells of the sample, the individual molecules of nucleic acid comprising sequences of complementary determining region 3 (CDR3) from T-cell receptor genes or immunoglobulin genes; (c) sequencing the spatially isolated individual molecules of the nucleic acids to generate sequence reads of CDR3 sequences; (d) coalescing the sequence reads into clonotypes to form a clonotype profile; and (c) monitoring the autoimmune disease, infectious disease or cancer by determining levels of correlating clonotypes in the clonotype profile.
 78. The method of claim 77 wherein said step of sequencing comprises sequencing by synthesis said spatially isolated individual molecules.
 79. The method of claim 78 wherein said sequencing by synthesis includes using reversibly terminated labeled nucleotides.
 80. The method of claim 78 wherein said step of sequencing comprises at least 1000 reads per run.
 81. The method of claim 77 further including a step of amplifying said individual molecules of nucleic acid prior to said step of spatially isolating said individual molecules.
 82. The method of claim 81 wherein said step of sequencing comprises at least 1000 reads per run.
 83. The method of claim 77 wherein said step of spatially isolating individual molecules includes spatially isolating said individual molecules of said nucleic acids in two dimensions on a solid substrate.
 84. The method of claim 77 wherein said sample comprises at least 10⁵ B-cells or at least 10⁵ T-cells and wherein said step of sequencing comprises at least 10⁶ reads per run.
 85. A method for monitoring an autoimmune disease, an infectious disease or a cancer of an individual by determining levels of correlating clonotypes of the disease, the method comprising: (a) obtaining a sample from the individual comprising at least 10,000 T-cells and/or B-cells; (b) amplifying molecules of nucleic acid from said cells of the sample, the molecules of nucleic acid comprising sequences of complementary determining region 3 (CDR3) from T-cell receptor genes or immunoglobulin genes; (c) spatially isolating individual molecules of the amplified nucleic acids; (d) sequencing the spatially isolated individual molecules of the amplified nucleic acids to provide a full repertoire of CDR3 sequences; and (e) monitoring said disease by determining levels of the one or more correlating clonotypes among the full repertoire of CDR3 sequences in the sample.
 86. The method of claim 85 wherein said sample is blood.
 87. The method of claim 85 wherein each of said full repertoire of CDR3 sequences comprises at least 1000 sequence reads each comprising at least 30 bp.
 88. The method of claim 85 wherein said full repertoire of CDR3 sequences comprise a full repertoire of CDR3 sequences of T-cell receptor β genes of said individual.
 89. The method of claim 85 wherein said sample comprises at least 10⁵ B-cells or at least 10⁵ T-cells and wherein said step of sequencing comprises at least 10⁶ reads per run.
 90. The method of claim 85 wherein said step of sequencing comprises generating at least 1000 reads per run and coalescing reads to form clonotypes.
 91. The method of claim 85 wherein said step of sequencing comprises sequencing by synthesis said spatially isolated individual molecules.
 92. The method of claim 35 further including said step of treating said individual with a drug based on said levels of said one or more correlating clonotypes.
 93. The method of claim 35 wherein said autoimmune disease is rheumatoid arthritis and wherein said step of treating includes treating said individual with anti-inflammatory drugs or DMARDs.
 94. The method of claim 35 wherein said autoimmune disease is ankylosing spondylitis and wherein said step of treating includes treating said individual with a drug selected from the group consisting of anti-inflammatory drugs, DMARDs, or TNFα blockers. 